A Round of Patching

I just finished an interesting email thread with a user that’s not at all an uncommon scenario.  They were just installing a CU to some of their SQL boxes, and there was evidently a new bug that was introduced into that CU that causes their SSRS reports to rendor incorrectly.  The sad thing is that this is the kind of thing that should have been caught in testing.  Now of course, I don’t expect them to find everything during testing, but the simple rendering of reports should have caught early on.  Because the second they turned their reporting structure back on, report rendering started messing up.  Now, he assures me that they tested it, but what exactly did they test?  Did they do any user testing at all or did they just test the install of the patch itself?  I see that quite often.  I see shops claim to do patch testing, but all they’re concerned with is whether the install itself will fail.

Most of the time I blame managers and companies in general for his because the IT staff knows what’s at stake more than anyone but they’re not given the proper time and resources it takes to properly test.  Managers always claim to know the business needs better, yet they never seem to fully grasp the full implications of putting untested code into production.  All they want to know is will this bring down my server.  Unfortunately that always means something different to them than it does to us.  What they mean is will this fail and put us into an emergency situation?  And the answer is probably not.  However, disabling a critical piece of your business is putting you into an emergency situation.  And in this case they’re lucky it’s just report rendering.  What if it had been something more serious that caused some kind of logical corruption in their processing?

I would say that quite often the biggest problems caused by hotfixes are performance-related.  And by not testing a CU or hotfix properly you could bring your entire business to its knees.  You can slow your website down so much that nobody can get any real work done.  And what does that do for the reliability and confidence of your business?  It would be really nice sometimes if we in IT didn’t have to rely solely on people who know nothing about it to get our jobs done properly. 

And what’s amazing to me is that companies today still, after all this time and all the mistakes, don’t think it’s necessary to hire competent people code and test.  Oh sure, they perform lip service about it all day long, but what do they really do about it?  Do they really slow down their dev cycles enough to do benchmarking and architect solutions properly?  Do they spend the money necessary to ensure that their code and servers are solid by hiring enough competent people?  Because every shop I’ve been in works their people so hard there’s no time for anything unexpected to happen.  So I think I’ve said this before, but now I’m saying it again.  It’s time to stop being children about IT.  It’s time to stop letting people who know nothing about it make the technical decisions that effect entire projects.  And it’s definitely time to stop thinking we can get by without testing things properly.  And that includes functional testing as well as performance benchmarking.

A dumb SSRS problem

I spent about an hour last night working to solve this stupid problem with a report so I thought I’d share it with you guys.  I’ve had this problem before, but for some reason I never remember it until I’ve had to work through it for a long time.

The problem is that I was trying to put a simple graph on the page and I often like to work with really simple test data.  So I chose a column that had the same value all the way down which would give me a straight line on my graph.  The problem is that I had 3 spikes.  There shouldn’t be any spikes;  the data’s all the same.

I laid the data out into a table in the report and the data was certainly all the same.  Then I ran the report query in SSMS and verified again that the column only had a single value in it.  So where were the spikes coming from?  So as it turns out the problem was with both my query and the collections.  I was collecting disk data every 10mins from a server and putting it into a table.  What I wanted was to graph today’s data by time.  And what I forgot to do was to limit the query to just today.  And since it collects the data every day at more or less the same times, there were some times from yesterday that were the same as the collection today, so SSRS was adding the values together.  And once I limited it to just today, then the spikes went away.

I consider myself lucky that I only spent an hour on this.  But it just goes to show that when data’s involved, debugging instantly gets harder because you tend to make assumptions one way or another.

T-SQL Tuesday #016: Get DB Sums With Powershell

I’m recycling this December 2010 post for T-SQL Tuesday #016.

Here’s a cute little piece of code to add up the sizes of all your DBs on a server. It comes in handy when you’re trying to see how much space you’ll need to duplicate the server either for DR, or reporting, or even just to see how much backup space you’ll need.

Make sure you’re at the DB context of PS:

>SQLSERVER:\sql\ServerName\default\databases

Then type this code:

>dir | %{$Total += $_.size}

>$Total

Ok, that’s it. Not too bad, huh? Now, let’s work on making it look like something you can actually work with.

Ordinarily you would prettify this by just dividing it by the measure you want to convert it to. So if you wanted to convert it to GB, then that would look like this:

>$Total/1GB

However, you can’t do that here. Here’s what you need to know.

This output is in MB, and the “/1GB” trick assumes that the output is in bytes. So lets assume that your $Total is 4189811.4375. Remember now, that’s in MB. So here’s what the 1GB trick will yield.

>$Total/1GB

0.00390206597512588

That’s not even close to right. And again, that’s because the 1GB trick assumes your output will be in bytes. So here the best way to convert this to GB is to do the math yourself like this:

>$Total/1024

4091.61273193359

Of course, you can still use one of the tricks to get the number you’re looking for, but since you’re already in MB you have to go 1 measure down to make it come out right.

>$Total/1KB

4091.61273193359

So why is that? Well, because 1KB is only a shortcut for 1024, and from MB, the conversion to GB is /1024. And the problem of course with doing it this way is that it’s deceiving as hell. Anyone else looking at this will think you’ve converted to KB because they may not realize that the original output is in MB. So don’t use this one, I only threw it in there to show you how to come up with the right number. It also shows you something about how the trick works.

Now, you can still use the 1GB trick, you just have to massage the data first. So here we’re going to put it back into bytes and then convert it to GB. This is really just to prove to you that the output for this trick really is assumed to be in bytes.

>($Total*1024*1024)/1GB

4091.61273193359

Personally I’d never do it this way. Sure, it shows you and someone else what’s going on so you can read the code, but that’s also what comments are for.

>$Total/1024 # Convert to GB. Can’t use /1GB because this output is in MB and the /1GB assumes the output to be in bytes.

See, what’s wrong with comments?

New Video: Find query timeouts in Profiler

Here’s the scenario: Users are complaining that queries are timing out, but nobody knows what queries or SPs go with what tabs on the front end. It’s up to you to find out where the problem is…

This new video shows you how to get query timeouts from a profiler trace.

Join in our live DBAs at Midnight webshow
Fridays at 11pmCST on our Ustream channel!

Watch all past shows at MidnightDBA.com

Pinging SQL Server in Powershell

It quite often happens that there will be some kind of issue keeping your app from hitting the DB.  Sometimes the issue is with SQL itself, but most of the time there’s something else involved.  The problem is that when it’s reported through the error stack at the app level, it only says that the DB was unavailable and it’s hard to get users to understand that it’s just a generic error that means the app couldn’t talk to the DB for whatever reason.  That reason could be NIC, or cable, OS, switch, router, etc.  There are many reasons why an app wouldn’t be able to get to a SQL box.  And it’s made even worse if it’s intermitent. 

So a good way to handle this is to put yourself a ping script up that will actually query SQL.  Not check that the server is up, or the service is running, but that you can actually ping SQL itself and run a query.  I’ve done this for you in powershell.  It runs a simple select and then writes a result to a txt file.  I setup an agent job to run every 5secs and run this code from a dos cmd task.

##C:\SQLPing.ps1

add-pssnapin sqlservercmdletsnapin100

$date = get-date

$a = Invoke-Sqlcmd -ServerInstance Servername -Database master -Query “select @@servername” -ErrorAction silentlyContinue -ErrorVariable err
if ($err.count -eq 0) {$a = “OK”}
else {$a = “Failed”}

# if (!$a) {$b = “Failed”}
“$date  :  $a” | Out-File c:\VI2conn.txt -append

This kind of script can help in a number of ways, and depending on the level of troubleshooting you want to do, you can place this on different boxes.  So you can place it on the local SQL box to have it test itself, or in another data center to test a specific link, or just in the same subnet or vlan, or put it in the same subnet as the app and then in a couple different ones.  How you spread it around depends on what you’re looking for.  But this can be a tiebreaker because if the apps people insist that SQL is down because of a generic error message, you can tell them that 2 boxes from 2 other subnets were able to ping it every 5secs, but the one from the app’s subnet failed 6 times or whatever.  This way you can also show something solid to the network guys and make it easier for them to find the problem on their end. 

Now, this won’t tell you if it’s a switch, firewall, NIC, OS, or what, but it will tell you that it’s not a problem with SQL itself or the SQL box.  Of course, if a couple of them fail then it could still be the SQL box cause again, it won’t tell you that either, but it could be a tiebreaker in the right circumstance.

AND, just because I’m a nice guy here’s the job script too.  I did this in SQL2K8.

USE

[msdb]GO

/****** Object: Job [SQLPing] Script Date: 02/04/2011 14:20:44 ******/

BEGIN

DECLARE

SELECT

TRANSACTION @ReturnCode INT @ReturnCode = 0/****** Object: JobCategory [[Uncategorized (Local)]]] Script Date: 02/04/2011 14:20:44 ******/

IF

NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]’ AND category_class=1)BEGIN

EXEC

@ReturnCode = msdb.dbo.sp_add_category @class=N’JOB’, @type=N’LOCAL’, @name=N'[Uncategorized (Local)]’IF

(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEND

DECLARE

@jobId BINARY(16)EXEC

@enabled

@notify_level_eventlog

@notify_level_email

@notify_level_netsend

@notify_level_page

@delete_level

@description

@category_name

@owner_login_name

IF

@ReturnCode = msdb.dbo.sp_add_job @job_name=N’SQLPing’, =1, =0, =0, =0, =0, =0, =N’No description available.’, =N'[Uncategorized (Local)]’, =N’sa’, @job_id = @jobId OUTPUT(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback/****** Object: Step [Ping] Script Date: 02/04/2011 14:20:45 ******/

EXEC

@step_id

@cmdexec_success_code

@on_success_action

@on_success_step_id

@on_fail_action

@on_fail_step_id

@retry_attempts

@retry_interval

@os_run_priority

@command

@flags

@ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N’Ping’, =1, =0, =1, =0, =2, =0, =0, =0, =0, @subsystem=N’CmdExec’, =N’powershell “c:\SQLPing.ps1″‘, =0IF

(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEXEC

@ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1IF

(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEXEC

@enabled

@freq_type

@freq_interval

@freq_subday_type

@freq_subday_interval

@freq_relative_interval

@freq_recurrence_factor

@active_start_date

@active_end_date

@active_start_time

@active_end_time

@schedule_uid

@ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N’Every 5secs’, =1, =4, =1, =2, =10, =0, =0, =20110204, =99991231, =0, =235959, =N’0b2e4594-92bc-438c-8311-d40076b53042′IF

(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEXEC

@ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)’IF

(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackCOMMIT

GOTO

TRANSACTION EndSaveQuitWithRollback:

 

EndSave:

GO

IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION

Why GoDaddy Sucks

Well, the site if finally back online.  I’m still moving a lot of the content over, but most of the vids are online again.

This whole debacle was caused by GoDaddy.  MidnightDBA.com has been up and running for 2yrs and for no reason whatsoever, it just stopped working last week.  None of my .net controls would work, so the menu and datagrids were all returning errors.  I tried to see if something went wrong on my end, and even pulled in the best .net guy I know from work, and nothing fixed it.  My work guy said that it seemed like GoDaddy was blocking .axd files and that’s why the content wasn’t working.  I finally got on with GoDaddy support, something I always dread because every time I’ve ever called them they’ve been absolutely useless.  Oh they’re polite, but every tech I’ve talked to with GoDaddy has been completely worthless and they do nothing but try to find ways to boot the call the second they start talking to you.

So anyway, I called support and no matter what I did I heard the same song again and again… “We don’t support custom coding.”

I tried and tried explaining to them that it’s not custom coding and that the site had been working for 2yrs.  And they insisted that nothing was wrong with the server and that my code had to be bad.  And to prove it they pushed a stupid little text file up and since they could read it, they used that as justification to try to close the case.  So I had the ticket escalated and talked to a couple managers.  The problem is with me at this point… for some reason I actually expected a different result.  I didn’t get one though.  The advanced escalation team fed me the same line of BS.  I even went around and around with them showing them a simple test page, and how the site worked on 3 other web servers, but they refused to listen to the simplest logic.  None of their support people know even the most basic principles of .net so they only gave me assinine ideas.  And that’s if the idea were even an idea at all.  The first guy I talked to went to his 2nd-level support guy and came back and said, “He said you have to push a ticket to your web.config file and do something I didn’t understand.”  Now, I have no idea what ‘push a ticket to your web.config’ means, but even if it meant something just the fact that he couldn’t finish the sentence means something doesn’t it?  He should have at least gone back to the guy to ask for more details, but he wouldn’t.  He was more concerned with booting the call because he doesn’t support custom coding.  And of course, I’m not allowed to talk to the 2nd level techs because I’m on a shared host and we’re not important enough to talk to the higher level morons.  So I was refused and was only allowed to talk to 1st-level guys.  And every time my tech would go away to ask someone something he’d be gone for around 30mins.

I did talk to one manager who actually got what I was saying though and she really went to bat for me.  Unfortunately, she works in a sea of morons and there’s just no way she’ll ever make a difference.  So after a week of being down, I signed up with another host, and amazingly the site came online right away.  I guess my code just decided to work again and it really didn’t have anything to do with GoDaddy’s servers, huh?  I mean, it must have been my mistake for thinking that code would run on their servers through whatever they do to the box.  It certainly wasn’t because they were blocking .axd files all of a sudden.  Because if they had for some reason decided to start blocking those file types, I know they would be responsible enough to tell their customers they were making a major change like that.  So I know now that it HAD to be me.  I didn’t do something right in my code.  So the code that I thought was solid and had been working for 2yrs just stopped on its own.  How could I have been so stupid? 

This is a warning to anyone who’s considering putting their site on GoDaddy.  If you do you’ll be sorry because they make changes on their servers without telling you about it and then blame you when something goes wrong.  If I hadn’t gone to another host, my site would have to be completely re-coded to use nothing but static HTML or classic ASP.  It’s ridiculous, unprofessional, and shows a real apathy for their customers.  GoDaddy has proven to me again and again that they don’t give a crap about their customers.  We’re just a paycheck to them.  But you know what?  If they keep this up we won’t be a paycheck much longer.  CIHost pulled a bunch of the same crap and look what happened to them.

SO DON’T GO TO GODADDY FOR ANYTHING!!!

Best Backup Class Evar: 1 of 6

Here’s the first of a series of backup/restore classes taught at the NTSSUG. This series is meant to cover everything about backups… or at least as much as we can squeeze into 6 sessions. You’ll learn how to put together an enterprise backup solution, tune your backups for maximum performance, and manage them across your company no matter how many servers you have. The file used for this class can be found here.

Title: Best Backup Class Evar: 1 of 6 (Right-click to download video)
Date: 1/20/2011
Length: 57:31
Size: 58,381 KB

Here are the other series-style classes I’ve taught at NTSSUG:

 

Join in our live DBAs at Midnightwebshow
Fridays at 11pmCST on our Ustream channel!

Watch all past shows at MidnightDBA.com

T-SQL Tuesday #014 – Resolutions

OK, it’s time for TSQLTuesday again and Jen’s making me write something since we’re hosting this month.  So the topic is resolutions, and that in itself isn’t a topic that’s near and dear to me because frankly I just don’t believe in them.  I don’t think you have to wait until a new year begins to resolve to do something you’ve been meaning to do.  In fact, that pretty much dooms you to not completing it because it takes more than the turning of a calendar page and a romantic notion to accomplish something.  If it were really that easy, you would have done it already so it wouldn’t be a big deal. 

Your new year can start anytime really.  Hell, doing a new year’s resolution doesn’t even line up with my review period at work, so if I relied on the new year to start something new I’d lose 3mos making good on what I’m supposed to accomplish for work.  People in IT quite often put personal goals in their yearly goals at work.  Things like getting certified, or perfecting a process, or taking management classes, etc are all things that are commonly found in your yearly goals at work.  So if you’re going to make some kind of resolution to do something, or to stop doing something, why not put it where it actually makes more sense… in your work goals.  Your bonus quite often relies on you completing your goals so it’s really the perfect place.  And it gives you a better excuse to have the resolution to begin with because you can use the bonus as motivation.

So even if you’re going to make a resolution at work, try to make it something you can actually do.  One of the biggest reasons for failure is someone will set a goal that’s completely ridiculous for them and when the goal starts slipping they get discouraged and just give up.  I’d like to get my MCM this year, but I don’t even have any of the lower certs yet.  Well, chances are you’re not going to make it dude.

With all that said here are my resolutions:

  1. Try to have more patience with customers and vendors.
  2. Get better with XML.
  3. Study Oracle more.
  4. Get more serious about learning Chinese.
  5. Find a better balance between work and family.

 

I’ve also heard that a few of you have vowed to finally learn powershell this year.  That’s a fine goal and there really is no better time because there are plenty of people giving all kinds of free resources.  One of the things I’ve done is to go out of my way to create some good video resources for those who know nothing about it and want to learn from the ground up.  Here are a few direct links so you can get started without any further ado.

http://midnightdba.itbookworm.com/VidPages/PowerShellOverview/PowerShellOverview.aspx

http://midnightdba.itbookworm.com/VidPages/PowershellBeginningForDBAs1/PowershellBeginningForDBAs1.aspx

http://midnightdba.itbookworm.com/VidPages/PowershellBeginningForDBAs2/PowershellBeginningForDBAs2.aspx

http://midnightdba.itbookworm.com/VidPages/PowershellPSDrive/PowershellPSDrive.aspx

http://midnightdba.itbookworm.com/VidPages/PowershellProfiles/PowershellProfiles.aspx

OK, that’s all I’ve got.  Good times.

Watch the show Tonight

Jen has already blogged about this HERE and she has told me to do the same. 

Our weekly webshow, DBAs@Midnight, or DBAs At Midnight as Jen calls it, but the official name is DBAs@Midnight.  Anyway, tonight’s show is going to be epic.  I’ve got something really big planned so if you don’t ordinarily watch the show live, you should this week. 

Here’s the rundown of the show timeline:

10:30pm CST — preshow.  We talk about whatever we feel like with the audience.  It’s a good time.

11:00pm CST — This is the main show.  This is what you get when you watch the recordings, but it’s much more fun live because you get to see what everyone is talking about in the chatroom.

12:00am CST — postshow.  Here we tell the underbelly side of the stories we told during the taping, and then just talk about stuff with the audience again.  I think mostly it lasts about 30-45mins, but our record is 2.5hrs because we all got on a roll about something.

So anyway, watch the show tonight.  You won’t be disappointed.

Here’s the link:  www.ustream.tv/channel/dbasatmidnight

Instead of working, I blog.