All posts by Sean McCown

I am a Contributing Editor for InfoWorld Magazine, and a frequent contributor to SQLServerCentral.com as well as SSWUG.org. I live with my wife and 3 kids, and have practiced and taught Kenpo for 22yrs now.

Getting a good look at deadlocks

One of the things that’s a major concern for DBAs is monitoring deadlocks.  And that’s really easy.  The more difficult part can be to sift through all the info once you’ve got it.  The deadlock graph in Profiler is my favorite tool for gathering them, and I prefer to shred the XML results into a table so I have all the info right there in front of me.

So here I’ve got a script I’ve written that shreds your deadlock XML into a relational table.  Actually, I modified this from the one Trevor Barkhouse did for his deadlocking presentation.  I just wanted some friendlier cols, and more info in the table.  Once you get the data into this format though, you can see how easy it is to turn it into a real process. 

What I love the most about this is how easy it is to go through a high number of deadlocks and query them for anything you like.  You can easily get a picture of your entire environment with very little effort.  Now, for this, you’ll have to get the deadlock graphs into XML to begin with and while I could show it to you here, Trevor does a good job of it so I’ll just point you to his blog on the topic.  However, once you have the XML, then here’s my query to mine it.

Here’s the code:
USE [tempdb];

GO

DECLARE @DeadlockCount INT;
DECLARE @DeadlockGraph XML;
DECLARE @DeadlockGraphFileName NVARCHAR(255);
DECLARE @DynamicSQL NVARCHAR(1024);

   SET @DeadlockGraphFileName = 'G:\DeadlockFiles\DeleteCV3OrderGenerationQueue\AllEvents.xml'
  
--'G:\DeadlockFiles\20110405-20110418_1.xdl';

   -- The OpenRowset() T-SQL function requires a string literal for the file
   -- name, so use dynamic SQL to avoid hardcoding a path (other than in the
   -- above assignment to the @DeadlockGraphFileName variable... but that's
   -- a little clearer).
   SET @DynamicSQL = N'SELECT @XMLDocument = Cast([BulkColumn] AS XML)' +
      
NCHAR(10) +
      
N'FROM OpenRowset' +
      
NCHAR(10) +
      
N'(' +
      
NCHAR(10) +
      
N'    BULK ''' +
      
@DeadlockGraphFileName +
      
N''',' +
      
NCHAR(10) +
      
N'    SINGLE_BLOB' +
      
NCHAR(10) +
      
N') AS [XMLDocument];';

PRINT @DynamicSQL
    
--  Execute the dynamic SQL and pull the result into the @XMLDocument
    --  local variable.
  
EXECUTE [master].[sys].[sp_executesql]
      
@stmt = @DynamicSQL,
      
@params = N'@XMLDocument XML OUTPUT',
      
@XMLDocument = @DeadlockGraph OUTPUT;
      
      
      
    
--  Count the number of deadlock graphs in the XDL file.
  
SELECT @DeadlockCount = DeadlockList.Graphs.value('count(deadlock)', 'Int')
  
FROM @DeadlockGraph.nodes('/deadlock-list') AS DeadlockList(Graphs);

   PRINT 'File            :   ' +
      
@DeadlockGraphFileName +
      
CHAR(10) +
      
'Deadlock count  :   ' +
      
CAST(@DeadlockCount AS VARCHAR);

    --  Extract the parties to the deadlock.
    
    
DECLARE @PartiesToTheDeadlocks TABLE
(
  
[VictimProcessID] VARCHAR(25),
    
[VictimLastBatchStarted] datetime2,
    
[VictimLockMode] VARCHAR(15),
    
[VictimXActID] VARCHAR(15),
    
[VictimClientApp] VARCHAR(50),
    
    
[LiveProcessID] VARCHAR(25),
    
[LiveLastBatchStarted] NVARCHAR(23),
    
[LiveLockMode] VARCHAR(15),
    
[LiveXActID] VARCHAR(15),
    
[LiveClientApp] VARCHAR(50),
    
    
[LiveFileID] VARCHAR(15),
    
[LivePageID] VARCHAR(15),
    
[LiveObjName] VARCHAR(50),
    
[LiveLockModeHeld] VARCHAR(50),    
    
[VictimLockModeRequest] VARCHAR(50),    
    
    
[VictimFileID] VARCHAR(15),
    
[VictimPageID] VARCHAR(15),
    
[VictimObjName] VARCHAR(50),
    
[VictimLockModeHeld] VARCHAR(50),    
    
[LiveLockModeRequest] VARCHAR(50),
    
    
[VictimProcName] VARCHAR(100),
    
[VictimExecStack] VARCHAR(MAX),
    
[LiveProcName] VARCHAR(100),
    
[LiveExecStack] VARCHAR(MAX),    
    
[VictimInputBuffer] VARCHAR(2048),      
    
[LiveInputBuffer] VARCHAR(2048)

);

    
    INSERT INTO @PartiesToTheDeadlocks
  
SELECT VictimProcessID,
          
VictimLastBatchStarted,
          
VictimLockMode,
          
VictimXActID,
          
VictimClientApp,
          
          
LiveProcessID,
          
LiveLastBatchStarted,
          
LiveLockMode,
          
LiveXActID,
          
LiveClientApp,
          
          
LiveFileID,
          
LivePageID,
          
LiveObjName,
          
LiveLockModeHeld,
          
VictimLockModeRequest,
          
          
VictimFileID,
          
VictimPageID,
          
VictimObjName,
          
VictimLockModeHeld,
          
LiveLockModeRequest,
          
          
VictimProcName,
          
VictimExecStack,
          
LiveProcName,
          
LiveExecStack,
          
VictimInputBuffer,
          
LiveInputBuffer                

   FROM
  
(
  
  
SELECT
  
--Victim
  
DeadlockList.Graphs.value('(process-list/process[1]/@spid)[1]', 'NVarChar(15)') AS VictimProcessID,
  
CAST(REPLACE(DeadlockList.Graphs.value('(process-list/process[1]/@lastbatchstarted)[1]', 'NChar(23)'), N'T', N' ') AS DATETIME) AS VictimLastBatchStarted,
  
DeadlockList.Graphs.value('(process-list/process[1]/@lockMode)[1]', 'NVarChar(15)') AS VictimLockMode,
  
DeadlockList.Graphs.value('(process-list/process[1]/@xactid)[1]', 'NVarChar(15)') AS VictimXActID,
  
DeadlockList.Graphs.value('(process-list/process[1]/@clientapp)[1]', 'NVarChar(50)') AS VictimClientApp,
  
--Live
  
DeadlockList.Graphs.value('(process-list/process[2]/@spid)[1]', 'NVarChar(15)') AS LiveProcessID,
  
CAST(REPLACE(DeadlockList.Graphs.value('(pprocess-list/process[2]/@lastbatchstarted)[1]', 'NChar(23)'), N'T', N' ') AS DATETIME) AS LiveLastBatchStarted,
  
DeadlockList.Graphs.value('(process-list/process[2]/@lockMode)[1]', 'NVarChar(15)') AS LiveLockMode,
  
DeadlockList.Graphs.value('(process-list/process[2]/@xactid)[1]', 'NVarChar(15)') AS LiveXActID,
  
DeadlockList.Graphs.value('(process-list/process[2]/@clientapp)[1]', 'NVarChar(50)') AS LiveClientApp,
  
--Live resource.
  
DeadlockList.Graphs.value('(resource-list/pagelock[1]/@fileid)[1]', 'NVarChar(15)') AS LiveFileID,
  
DeadlockList.Graphs.value('(resource-list/pagelock[1]/@pageid)[1]', 'NVarChar(15)') AS LivePageID,
  
DeadlockList.Graphs.value('(resource-list/pagelock[1]/@objectname)[1]', 'NVarChar(50)') AS LiveObjName,
  
DeadlockList.Graphs.value('(resource-list/pagelock[1]/@mode)[1]', 'NVarChar(50)') AS LiveLockModeHeld,
  
DeadlockList.Graphs.value('(resource-list/pagelock[1]/waiter-list/waiter/@mode)[1]', 'NVarChar(50)') AS VictimLockModeRequest,    
  
--Victim resource.
  
DeadlockList.Graphs.value('(resource-list/pagelock[2]/@fileid)[1]', 'NVarChar(15)') AS VictimFileID,
  
DeadlockList.Graphs.value('(resource-list/pagelock[2]/@pageid)[1]', 'NVarChar(15)') AS VictimPageID,
  
DeadlockList.Graphs.value('(resource-list/pagelock[2]/@objectname)[1]', 'NVarChar(50)') AS VictimObjName,
  
DeadlockList.Graphs.value('(resource-list/pagelock[2]/@mode)[1]', 'NVarChar(50)') AS VictimLockModeHeld,  
  
DeadlockList.Graphs.value('(resource-list/pagelock[2]/waiter-list/waiter/@mode)[1]', 'NVarChar(50)') AS LiveLockModeRequest,
  
--Inputbuffers
  
DeadlockList.Graphs.value('(process-list/process[1]/executionStack/frame/@procname)[1]', 'NVarChar(100)') AS VictimProcName,
  
DeadlockList.Graphs.value('(process-list/process[1]/executionStack/frame)[1]', 'VarChar(max)') AS VictimExecStack,
  
DeadlockList.Graphs.value('(process-list/process[2]/executionStack/frame/@procname)[1]', 'NVarChar(max)') AS LiveProcName,
  
DeadlockList.Graphs.value('(process-list/process[2]/executionStack/frame)[1]', 'VarChar(max)') AS LiveExecStack,
  
RTRIM(LTRIM(REPLACE(DeadlockList.Graphs.value('(process-list/process[1]/inputbuf)[1]', 'NVarChar(2048)'), NCHAR(10), N''))) AS VictimInputBuffer,
RTrim(LTrim(Replace(DeadlockList.Graphs.value('
(process-list/process[2]/inputbuf)[1]', 'NVARCHAR(2048)'), NChar(10), N''))) AS LiveInputBuffer

FROM @DeadlockGraph.nodes('/deadlock-list/deadlock') AS DeadlockList(Graphs)
) AS [_ProcessDetails];

--  Display the parties to the deadlock.
SELECT *
FROM @PartiesToTheDeadlocks
ORDER BY VictimLastBatchStarted, VictimXActID;'

Now, the code formatter messed up the coloring, but the code works. 

Now, I’ll list the cols and explain them cause I know that’s the next thing you’re gonna want.  It’s broken up into 2 categories… the Victim process and the Live process.  I didn’t know what else to call it other than live and still keep it short enough to prefix the cols with.

VictimProcessID – spid of victim.

VictimLastBatchStarted – date/time of last batch for victim.

VictimLockMode – lock held by victim.

VictimXActID – trans ID for victim.

VictimClientApp – Victim client app.

LiveProcessID – spid of live process.

LiveLastBatchStarted – date/time of last batch for live process.

LiveLockMode – lock held by live process.

LiveXActID – trans ID within the batch for victim (I think).

LiveClientApp – client app for live process.

LiveFileID – file ID where the locked resource for live process is held.

LivePageID – page ID the processes are fighting over.  The live process currently holds the lock on this one.

LiveObjName – object the processes are fighting over.  The live process currently holds the lock on this one.

LiveLockModeHeld – lock held on the above object by the live process.

VictimLockModeRequest – lock being requested  on the above object by the victim.

VictimFileID – file ID where the object the victim process is holding is held.

VictimPageID – Page ID the processes are fighting over.  The victim currently holds the lock on this one.

VictimObjName – object the processes are fighting over.  the victim currently holds the lock on this one.

VictimLockModeHeld – lock held on the above object by the victim.

LiveLockModeRequest – lock being requested on the above object by the live process.

VictimProcName – the SP called by the victim.

VictimExecStack – the stmt in the SP that caused the deadlock.

LiveProcName – the SP called by the live process.

LiveExecStack – the stmt in the SP that caused the deadlock.

VictimInputBuffer – the exact SQL that was sent to SQL engine.

LiveInputBuffer – the exact SQL that was sent to SQL engine.

Now you can see at a glance where your problem deadlocks are, how often they’re occurring, and which resources are under the most contention.

Good stuff that.

Something’s just not right

Like every DBA, I have customers that are harder to deal with than others, and I have some that are just impossible.  Now, I’m not talking about those annoying ones that ask for weird stuff, I’m talking about the ones that you just hate to work with because they’re either condescending, or rude, etc.  I’ve got 2 such customers right now.  And whenever I see their names on a meeting request I try to pawn them off onto one of my other DBAs if I can.  If I can’t though, I have to suck it up and endure the meeting myself. 

Well, I actually had meetings with both of them today.  I was able to send one of my guys to the 1st one, but I had to attend the 2nd one myself.  Now, this is a guy who typically likes to attack whatever I say.  He doesn’t just disagree with me, he actually attacks me.  So I’m sure I don’t have to tell you guys that I wasn’t looking forward to this meeting.  When I got there though, he was polite, thoughtful, and actually listened to what I said.  And believe me, I’ve learned not to say much around him unless specifically asked something because I just don’t need the fight.  So we were about to leave, and everyone else was on their way out, and he pulled me back.  Ok, here it comes… this is what I’ve been waiting for, only now there are no witnesses.  So he says to me… I just wanted to clarification on this one thing.  Exactly what do we need to backup and where does it need to be stored?  OK, I’m starting to think something’s wrong now, because this guy never admits he doesn’t know something, and in fact, spends his time telling me I don’t know what I’m talking about.

Then I get a report from my other DBA telling me that the guy in his meeting was the same way.  He just kept his mouth shut and answered questions, and that’s it.  No accusations, no insistance on ridiculous information, etc.  He just listened and went on about his day.

Now I know something’s up.  This isn’t normal and I’m trying to figure out where the real attack is coming from.  If this were chess it would be the equivalent of Bobby Fisher handing you his queen early in the game.  DON’T EVER TAKE THAT QUEEN.  So anyway, just waiting for the other shoe to fall.

How to Curse in an Interview

Last weekend at SQL Saturday #63 Dallas, I presented a session titled “How to Curse in an Interview”.  It clearly contains some NSFW language, but it was really a good session, and I got a lot of positive feedback.  The recorded session is up on the MidnightDBA Events page

Here is the abstract:

How to curse in an interview

Prof. Development Track –2:45 pm to 3:45 pm

This session will take a hard look at different things that happen in interviews and give you what you need to know to get that next job. This isn’t a recruiter session that’ll tell you to wear a tie and smile. I’m a real Sr. DBA who hires DBAs all the time. I’m going to tell you what we look for in a DBA, and the types of things you should and shouldn’t say. I’ll show you how to pass a tech screening and get past the HR portion as well. Your mama can teach you how to dress. I’m gonnateach you how to get hired. You seriously can’t afford to miss this discussion because it’ll be the most candid anyone will ever be with you about interviews. This will be a fun and very interactive session.

And as long as we’re here, here are a few other things I’ve presented at SQL Saturdays:

A Lovely Statistics Query

Most of you know that you can set SQL to auto update stats.  And I’m sure that most of you know that it invalidates stats at about 20%.  What that means is that when about 20% of the data in the table has changed, the stats are invalidated.  It’s actually 500 + 20% if you wanna be specific.  And notice I say they’re invalidated and not updated.  Stats aren’t updated when that 20% mark is reached, they’re only invalidated.  They’re not rebuilt until they’re needed.  Well, for really large tables that can result in stats being oudated longer than they should because the amount of changes needed to invalidate them is much higher.  The same issue can occur with smaller tables where the changes are varied enough to throw the distribution out of whack.  Some tables are just more sensitive to change than others and that 20% may be too much.  You may find that you need to update stats more often… say at 10% or in extreme cases even 5%.

Here’s a sexy little query I put together that will give you the percentage of change an index has had.  You can use it to not only run a job to update stats on a different percentage, but you can also just query it and see what percentage your tables are at and how fast they change.  It’s an excellent tool for troubleshooting possible stats problems.  All the same, it’s pretty handy. 

–Get stats percentage of change for tables.

SELECT
so.SCHEMA_ID,
ss.NAME AS [Schema],
OBJECT_NAME(id) AS TableName, si.NAME AS IndexName,
CAST((CAST(rowmodctr AS float)/CAST(rowcnt AS float))*100 AS int)  AS Pct,
rowcnt,
STATS_DATE([id], indid) AS [StatsDate]
FROM sys.sysindexes si
INNER JOIN sys.objects so
ON si.id = so.object_id
INNER JOIN sys.schemas ss
ON so.schema_id = ss.schema_id
WHERE rowcnt >= 1000000--500
AND rowmodctr > 0
--AND OBJECT_NAME(id) = 'TableName'
ORDER BY Pct DESC, rowcnt DESC, rowmodctr

Here are a couple notes on the query:

1.  Notice there’s a way to query for a specific table.

2.  The StatsDate col shows you the last time the stats were updated.

3.  I’m also limiting the rowcount to indexes with more than 1mill rows.  Feel free to lower that if you like.

4.  rowmodctr > 0 is something I threw in there to keep the divide by zero error out of there.  This also filters out the system-created stats.

OK, I hope you guys like this one.  I personally love it.  It’s allowed me to build a process to update stats on some of my tables more aggressively than 20% and keep my server running at its peak.

Troubleshooting a replication scenario

I just had a nice replication troubleshooting experience I thought I’d share with you guys.  I got a call from our app team saying that replication was failing.  I looked and the error was a bit unexplainable.

“Column names in each table must be unique.”

OK, so why the hell is it telling me that?  This publication has been running for months now with no problems.  I checked the table in question on both sides, and as it turns out the column it was complaining about had and int data type on the publication side and a uniqueidentifier on the subscription side.  So how did these columns get different data types?  Did someone change the data type on the subscriber to an incompatible data type?  No, that’s probably not it.  And that’s not the error message you would get for something like that anyway.  But someone had to change the data type right?  Well, only kinda.  Here’s what actually happened. 

One of the devs decided he needed this column in the table, and instead of going to the DBAs, he added it to the subscriber himself.  He then added it to the publisher (with a different data type).  So when replication went to replicate the DDL change, it saw that the subscription side already had the column and spit out that excellent error message.

Let this be a lesson to those of you who have DBAs and you constantly look for ways to circumvent them.  Why don’t you stop every now and then and implement things properly.  And by properly I mean slow down and take the time to think about what you’re doing.  The days are over when you can just do whatever you want because it crosses your mind… esp when you have replication running.  You have to plan schema changes or you’ll spend tons of time troubleshooting and fixing your scenario.  Not to mention what it does to whatever processes rely on this replication process.  Anyway, that’s all I’ve got for now.  Let’s all be safe in production.

A Round of Patching

I just finished an interesting email thread with a user that’s not at all an uncommon scenario.  They were just installing a CU to some of their SQL boxes, and there was evidently a new bug that was introduced into that CU that causes their SSRS reports to rendor incorrectly.  The sad thing is that this is the kind of thing that should have been caught in testing.  Now of course, I don’t expect them to find everything during testing, but the simple rendering of reports should have caught early on.  Because the second they turned their reporting structure back on, report rendering started messing up.  Now, he assures me that they tested it, but what exactly did they test?  Did they do any user testing at all or did they just test the install of the patch itself?  I see that quite often.  I see shops claim to do patch testing, but all they’re concerned with is whether the install itself will fail.

Most of the time I blame managers and companies in general for his because the IT staff knows what’s at stake more than anyone but they’re not given the proper time and resources it takes to properly test.  Managers always claim to know the business needs better, yet they never seem to fully grasp the full implications of putting untested code into production.  All they want to know is will this bring down my server.  Unfortunately that always means something different to them than it does to us.  What they mean is will this fail and put us into an emergency situation?  And the answer is probably not.  However, disabling a critical piece of your business is putting you into an emergency situation.  And in this case they’re lucky it’s just report rendering.  What if it had been something more serious that caused some kind of logical corruption in their processing?

I would say that quite often the biggest problems caused by hotfixes are performance-related.  And by not testing a CU or hotfix properly you could bring your entire business to its knees.  You can slow your website down so much that nobody can get any real work done.  And what does that do for the reliability and confidence of your business?  It would be really nice sometimes if we in IT didn’t have to rely solely on people who know nothing about it to get our jobs done properly. 

And what’s amazing to me is that companies today still, after all this time and all the mistakes, don’t think it’s necessary to hire competent people code and test.  Oh sure, they perform lip service about it all day long, but what do they really do about it?  Do they really slow down their dev cycles enough to do benchmarking and architect solutions properly?  Do they spend the money necessary to ensure that their code and servers are solid by hiring enough competent people?  Because every shop I’ve been in works their people so hard there’s no time for anything unexpected to happen.  So I think I’ve said this before, but now I’m saying it again.  It’s time to stop being children about IT.  It’s time to stop letting people who know nothing about it make the technical decisions that effect entire projects.  And it’s definitely time to stop thinking we can get by without testing things properly.  And that includes functional testing as well as performance benchmarking.

A dumb SSRS problem

I spent about an hour last night working to solve this stupid problem with a report so I thought I’d share it with you guys.  I’ve had this problem before, but for some reason I never remember it until I’ve had to work through it for a long time.

The problem is that I was trying to put a simple graph on the page and I often like to work with really simple test data.  So I chose a column that had the same value all the way down which would give me a straight line on my graph.  The problem is that I had 3 spikes.  There shouldn’t be any spikes;  the data’s all the same.

I laid the data out into a table in the report and the data was certainly all the same.  Then I ran the report query in SSMS and verified again that the column only had a single value in it.  So where were the spikes coming from?  So as it turns out the problem was with both my query and the collections.  I was collecting disk data every 10mins from a server and putting it into a table.  What I wanted was to graph today’s data by time.  And what I forgot to do was to limit the query to just today.  And since it collects the data every day at more or less the same times, there were some times from yesterday that were the same as the collection today, so SSRS was adding the values together.  And once I limited it to just today, then the spikes went away.

I consider myself lucky that I only spent an hour on this.  But it just goes to show that when data’s involved, debugging instantly gets harder because you tend to make assumptions one way or another.

T-SQL Tuesday #016: Get DB Sums With Powershell

I’m recycling this December 2010 post for T-SQL Tuesday #016.

Here’s a cute little piece of code to add up the sizes of all your DBs on a server. It comes in handy when you’re trying to see how much space you’ll need to duplicate the server either for DR, or reporting, or even just to see how much backup space you’ll need.

Make sure you’re at the DB context of PS:

>SQLSERVER:\sql\ServerName\default\databases

Then type this code:

>dir | %{$Total += $_.size}

>$Total

Ok, that’s it. Not too bad, huh? Now, let’s work on making it look like something you can actually work with.

Ordinarily you would prettify this by just dividing it by the measure you want to convert it to. So if you wanted to convert it to GB, then that would look like this:

>$Total/1GB

However, you can’t do that here. Here’s what you need to know.

This output is in MB, and the “/1GB” trick assumes that the output is in bytes. So lets assume that your $Total is 4189811.4375. Remember now, that’s in MB. So here’s what the 1GB trick will yield.

>$Total/1GB

0.00390206597512588

That’s not even close to right. And again, that’s because the 1GB trick assumes your output will be in bytes. So here the best way to convert this to GB is to do the math yourself like this:

>$Total/1024

4091.61273193359

Of course, you can still use one of the tricks to get the number you’re looking for, but since you’re already in MB you have to go 1 measure down to make it come out right.

>$Total/1KB

4091.61273193359

So why is that? Well, because 1KB is only a shortcut for 1024, and from MB, the conversion to GB is /1024. And the problem of course with doing it this way is that it’s deceiving as hell. Anyone else looking at this will think you’ve converted to KB because they may not realize that the original output is in MB. So don’t use this one, I only threw it in there to show you how to come up with the right number. It also shows you something about how the trick works.

Now, you can still use the 1GB trick, you just have to massage the data first. So here we’re going to put it back into bytes and then convert it to GB. This is really just to prove to you that the output for this trick really is assumed to be in bytes.

>($Total*1024*1024)/1GB

4091.61273193359

Personally I’d never do it this way. Sure, it shows you and someone else what’s going on so you can read the code, but that’s also what comments are for.

>$Total/1024 # Convert to GB. Can’t use /1GB because this output is in MB and the /1GB assumes the output to be in bytes.

See, what’s wrong with comments?

Pinging SQL Server in Powershell

It quite often happens that there will be some kind of issue keeping your app from hitting the DB.  Sometimes the issue is with SQL itself, but most of the time there’s something else involved.  The problem is that when it’s reported through the error stack at the app level, it only says that the DB was unavailable and it’s hard to get users to understand that it’s just a generic error that means the app couldn’t talk to the DB for whatever reason.  That reason could be NIC, or cable, OS, switch, router, etc.  There are many reasons why an app wouldn’t be able to get to a SQL box.  And it’s made even worse if it’s intermitent. 

So a good way to handle this is to put yourself a ping script up that will actually query SQL.  Not check that the server is up, or the service is running, but that you can actually ping SQL itself and run a query.  I’ve done this for you in powershell.  It runs a simple select and then writes a result to a txt file.  I setup an agent job to run every 5secs and run this code from a dos cmd task.

##C:\SQLPing.ps1

add-pssnapin sqlservercmdletsnapin100

$date = get-date

$a = Invoke-Sqlcmd -ServerInstance Servername -Database master -Query “select @@servername” -ErrorAction silentlyContinue -ErrorVariable err
if ($err.count -eq 0) {$a = “OK”}
else {$a = “Failed”}

# if (!$a) {$b = “Failed”}
“$date  :  $a” | Out-File c:\VI2conn.txt -append

This kind of script can help in a number of ways, and depending on the level of troubleshooting you want to do, you can place this on different boxes.  So you can place it on the local SQL box to have it test itself, or in another data center to test a specific link, or just in the same subnet or vlan, or put it in the same subnet as the app and then in a couple different ones.  How you spread it around depends on what you’re looking for.  But this can be a tiebreaker because if the apps people insist that SQL is down because of a generic error message, you can tell them that 2 boxes from 2 other subnets were able to ping it every 5secs, but the one from the app’s subnet failed 6 times or whatever.  This way you can also show something solid to the network guys and make it easier for them to find the problem on their end. 

Now, this won’t tell you if it’s a switch, firewall, NIC, OS, or what, but it will tell you that it’s not a problem with SQL itself or the SQL box.  Of course, if a couple of them fail then it could still be the SQL box cause again, it won’t tell you that either, but it could be a tiebreaker in the right circumstance.

AND, just because I’m a nice guy here’s the job script too.  I did this in SQL2K8.

USE

[msdb]GO

/****** Object: Job [SQLPing] Script Date: 02/04/2011 14:20:44 ******/

BEGIN

DECLARE

SELECT

TRANSACTION @ReturnCode INT @ReturnCode = 0/****** Object: JobCategory [[Uncategorized (Local)]]] Script Date: 02/04/2011 14:20:44 ******/

IF

NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]’ AND category_class=1)BEGIN

EXEC

@ReturnCode = msdb.dbo.sp_add_category @class=N’JOB’, @type=N’LOCAL’, @name=N'[Uncategorized (Local)]’IF

(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEND

DECLARE

@jobId BINARY(16)EXEC

@enabled

@notify_level_eventlog

@notify_level_email

@notify_level_netsend

@notify_level_page

@delete_level

@description

@category_name

@owner_login_name

IF

@ReturnCode = msdb.dbo.sp_add_job @job_name=N’SQLPing’, =1, =0, =0, =0, =0, =0, =N’No description available.’, =N'[Uncategorized (Local)]’, =N’sa’, @job_id = @jobId OUTPUT(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback/****** Object: Step [Ping] Script Date: 02/04/2011 14:20:45 ******/

EXEC

@step_id

@cmdexec_success_code

@on_success_action

@on_success_step_id

@on_fail_action

@on_fail_step_id

@retry_attempts

@retry_interval

@os_run_priority

@command

@flags

@ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N’Ping’, =1, =0, =1, =0, =2, =0, =0, =0, =0, @subsystem=N’CmdExec’, =N’powershell “c:\SQLPing.ps1″‘, =0IF

(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEXEC

@ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1IF

(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEXEC

@enabled

@freq_type

@freq_interval

@freq_subday_type

@freq_subday_interval

@freq_relative_interval

@freq_recurrence_factor

@active_start_date

@active_end_date

@active_start_time

@active_end_time

@schedule_uid

@ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N’Every 5secs’, =1, =4, =1, =2, =10, =0, =0, =20110204, =99991231, =0, =235959, =N’0b2e4594-92bc-438c-8311-d40076b53042′IF

(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEXEC

@ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)’IF

(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackCOMMIT

GOTO

TRANSACTION EndSaveQuitWithRollback:

 

EndSave:

GO

IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION

Why GoDaddy Sucks

Well, the site if finally back online.  I’m still moving a lot of the content over, but most of the vids are online again.

This whole debacle was caused by GoDaddy.  MidnightDBA.com has been up and running for 2yrs and for no reason whatsoever, it just stopped working last week.  None of my .net controls would work, so the menu and datagrids were all returning errors.  I tried to see if something went wrong on my end, and even pulled in the best .net guy I know from work, and nothing fixed it.  My work guy said that it seemed like GoDaddy was blocking .axd files and that’s why the content wasn’t working.  I finally got on with GoDaddy support, something I always dread because every time I’ve ever called them they’ve been absolutely useless.  Oh they’re polite, but every tech I’ve talked to with GoDaddy has been completely worthless and they do nothing but try to find ways to boot the call the second they start talking to you.

So anyway, I called support and no matter what I did I heard the same song again and again… “We don’t support custom coding.”

I tried and tried explaining to them that it’s not custom coding and that the site had been working for 2yrs.  And they insisted that nothing was wrong with the server and that my code had to be bad.  And to prove it they pushed a stupid little text file up and since they could read it, they used that as justification to try to close the case.  So I had the ticket escalated and talked to a couple managers.  The problem is with me at this point… for some reason I actually expected a different result.  I didn’t get one though.  The advanced escalation team fed me the same line of BS.  I even went around and around with them showing them a simple test page, and how the site worked on 3 other web servers, but they refused to listen to the simplest logic.  None of their support people know even the most basic principles of .net so they only gave me assinine ideas.  And that’s if the idea were even an idea at all.  The first guy I talked to went to his 2nd-level support guy and came back and said, “He said you have to push a ticket to your web.config file and do something I didn’t understand.”  Now, I have no idea what ‘push a ticket to your web.config’ means, but even if it meant something just the fact that he couldn’t finish the sentence means something doesn’t it?  He should have at least gone back to the guy to ask for more details, but he wouldn’t.  He was more concerned with booting the call because he doesn’t support custom coding.  And of course, I’m not allowed to talk to the 2nd level techs because I’m on a shared host and we’re not important enough to talk to the higher level morons.  So I was refused and was only allowed to talk to 1st-level guys.  And every time my tech would go away to ask someone something he’d be gone for around 30mins.

I did talk to one manager who actually got what I was saying though and she really went to bat for me.  Unfortunately, she works in a sea of morons and there’s just no way she’ll ever make a difference.  So after a week of being down, I signed up with another host, and amazingly the site came online right away.  I guess my code just decided to work again and it really didn’t have anything to do with GoDaddy’s servers, huh?  I mean, it must have been my mistake for thinking that code would run on their servers through whatever they do to the box.  It certainly wasn’t because they were blocking .axd files all of a sudden.  Because if they had for some reason decided to start blocking those file types, I know they would be responsible enough to tell their customers they were making a major change like that.  So I know now that it HAD to be me.  I didn’t do something right in my code.  So the code that I thought was solid and had been working for 2yrs just stopped on its own.  How could I have been so stupid? 

This is a warning to anyone who’s considering putting their site on GoDaddy.  If you do you’ll be sorry because they make changes on their servers without telling you about it and then blame you when something goes wrong.  If I hadn’t gone to another host, my site would have to be completely re-coded to use nothing but static HTML or classic ASP.  It’s ridiculous, unprofessional, and shows a real apathy for their customers.  GoDaddy has proven to me again and again that they don’t give a crap about their customers.  We’re just a paycheck to them.  But you know what?  If they keep this up we won’t be a paycheck much longer.  CIHost pulled a bunch of the same crap and look what happened to them.

SO DON’T GO TO GODADDY FOR ANYTHING!!!