Tag Archives: DBA

What makes a Sr. DBA?

I get indirectly asked this question all the time… what makes a Sr. DBA?  Well, that question is gonna be answered 20 different ways by 10 different DBAs.  But here’s my answer.

Aside from whatever specific qualities you assign to what a Sr should look like, here’s a stick you can use to measure yourself.  You should be able to meet both of these criteria.

  1. 1.       Are you right most of the time?
  2. 2.      Do you handle your own problems instead of calling daddy?

 

Ok, let’s talk about #1 first.  Nobody is right all the time, but are you right most of the time?  When you get on in a crisis, do you diagnose the problem correctly say better than 90% of the time?  And do you discover and deduce the problem, or do you just fall into it?  Do your users come to you for answers?  In other words, have you earned your place as the go-to guy?

Next, how often do you call daddy?  If you’re in a shop with an existing Sr. DBA, do you call him when you have a problem or do you research and solve your own issues before getting him involved?  It’s always really easy to call the Lead DBA, but it doesn’t teach you anything.  And as long as you’re relying on his research skills you’ll never be the go-to guy yourself.

I remember it well.  Much longer ago than I care to remember, I asked whatever male figure I had how you know when you’re a man.  He told me something that stuck with me all these years.  He said, you know you’re a man when you quit calling your parents when you have trouble.  And I remember it hit me once when I was driving late at night and got a flat tire.  I just got out and changed it and went on my way.  And a year ago I would have called my folks to come help me.  That was my first hint that I may have crossed into manhood.  Because at some point you realize that it’s up to you.

It’s the same in the IT world.  You go through these years of learning and then of getting proficient, and at some point it dawns on you that it’s all up to you and only you can solve your problems.  You have to be the one to investigate and solve the blocking, or the deadlocks, or the excessive waits, etc.

And that doesn’t mean that you never need any help with anything.  Nothing could be further from the truth, but how often do you need that external help?

Getting a good look at deadlocks

One of the things that’s a major concern for DBAs is monitoring deadlocks.  And that’s really easy.  The more difficult part can be to sift through all the info once you’ve got it.  The deadlock graph in Profiler is my favorite tool for gathering them, and I prefer to shred the XML results into a table so I have all the info right there in front of me.

So here I’ve got a script I’ve written that shreds your deadlock XML into a relational table.  Actually, I modified this from the one Trevor Barkhouse did for his deadlocking presentation.  I just wanted some friendlier cols, and more info in the table.  Once you get the data into this format though, you can see how easy it is to turn it into a real process. 

What I love the most about this is how easy it is to go through a high number of deadlocks and query them for anything you like.  You can easily get a picture of your entire environment with very little effort.  Now, for this, you’ll have to get the deadlock graphs into XML to begin with and while I could show it to you here, Trevor does a good job of it so I’ll just point you to his blog on the topic.  However, once you have the XML, then here’s my query to mine it.

Here’s the code:
USE [tempdb];

GO

DECLARE @DeadlockCount INT;
DECLARE @DeadlockGraph XML;
DECLARE @DeadlockGraphFileName NVARCHAR(255);
DECLARE @DynamicSQL NVARCHAR(1024);

   SET @DeadlockGraphFileName = 'G:\DeadlockFiles\DeleteCV3OrderGenerationQueue\AllEvents.xml'
  
--'G:\DeadlockFiles\20110405-20110418_1.xdl';

   -- The OpenRowset() T-SQL function requires a string literal for the file
   -- name, so use dynamic SQL to avoid hardcoding a path (other than in the
   -- above assignment to the @DeadlockGraphFileName variable... but that's
   -- a little clearer).
   SET @DynamicSQL = N'SELECT @XMLDocument = Cast([BulkColumn] AS XML)' +
      
NCHAR(10) +
      
N'FROM OpenRowset' +
      
NCHAR(10) +
      
N'(' +
      
NCHAR(10) +
      
N'    BULK ''' +
      
@DeadlockGraphFileName +
      
N''',' +
      
NCHAR(10) +
      
N'    SINGLE_BLOB' +
      
NCHAR(10) +
      
N') AS [XMLDocument];';

PRINT @DynamicSQL
    
--  Execute the dynamic SQL and pull the result into the @XMLDocument
    --  local variable.
  
EXECUTE [master].[sys].[sp_executesql]
      
@stmt = @DynamicSQL,
      
@params = N'@XMLDocument XML OUTPUT',
      
@XMLDocument = @DeadlockGraph OUTPUT;
      
      
      
    
--  Count the number of deadlock graphs in the XDL file.
  
SELECT @DeadlockCount = DeadlockList.Graphs.value('count(deadlock)', 'Int')
  
FROM @DeadlockGraph.nodes('/deadlock-list') AS DeadlockList(Graphs);

   PRINT 'File            :   ' +
      
@DeadlockGraphFileName +
      
CHAR(10) +
      
'Deadlock count  :   ' +
      
CAST(@DeadlockCount AS VARCHAR);

    --  Extract the parties to the deadlock.
    
    
DECLARE @PartiesToTheDeadlocks TABLE
(
  
[VictimProcessID] VARCHAR(25),
    
[VictimLastBatchStarted] datetime2,
    
[VictimLockMode] VARCHAR(15),
    
[VictimXActID] VARCHAR(15),
    
[VictimClientApp] VARCHAR(50),
    
    
[LiveProcessID] VARCHAR(25),
    
[LiveLastBatchStarted] NVARCHAR(23),
    
[LiveLockMode] VARCHAR(15),
    
[LiveXActID] VARCHAR(15),
    
[LiveClientApp] VARCHAR(50),
    
    
[LiveFileID] VARCHAR(15),
    
[LivePageID] VARCHAR(15),
    
[LiveObjName] VARCHAR(50),
    
[LiveLockModeHeld] VARCHAR(50),    
    
[VictimLockModeRequest] VARCHAR(50),    
    
    
[VictimFileID] VARCHAR(15),
    
[VictimPageID] VARCHAR(15),
    
[VictimObjName] VARCHAR(50),
    
[VictimLockModeHeld] VARCHAR(50),    
    
[LiveLockModeRequest] VARCHAR(50),
    
    
[VictimProcName] VARCHAR(100),
    
[VictimExecStack] VARCHAR(MAX),
    
[LiveProcName] VARCHAR(100),
    
[LiveExecStack] VARCHAR(MAX),    
    
[VictimInputBuffer] VARCHAR(2048),      
    
[LiveInputBuffer] VARCHAR(2048)

);

    
    INSERT INTO @PartiesToTheDeadlocks
  
SELECT VictimProcessID,
          
VictimLastBatchStarted,
          
VictimLockMode,
          
VictimXActID,
          
VictimClientApp,
          
          
LiveProcessID,
          
LiveLastBatchStarted,
          
LiveLockMode,
          
LiveXActID,
          
LiveClientApp,
          
          
LiveFileID,
          
LivePageID,
          
LiveObjName,
          
LiveLockModeHeld,
          
VictimLockModeRequest,
          
          
VictimFileID,
          
VictimPageID,
          
VictimObjName,
          
VictimLockModeHeld,
          
LiveLockModeRequest,
          
          
VictimProcName,
          
VictimExecStack,
          
LiveProcName,
          
LiveExecStack,
          
VictimInputBuffer,
          
LiveInputBuffer                

   FROM
  
(
  
  
SELECT
  
--Victim
  
DeadlockList.Graphs.value('(process-list/process[1]/@spid)[1]', 'NVarChar(15)') AS VictimProcessID,
  
CAST(REPLACE(DeadlockList.Graphs.value('(process-list/process[1]/@lastbatchstarted)[1]', 'NChar(23)'), N'T', N' ') AS DATETIME) AS VictimLastBatchStarted,
  
DeadlockList.Graphs.value('(process-list/process[1]/@lockMode)[1]', 'NVarChar(15)') AS VictimLockMode,
  
DeadlockList.Graphs.value('(process-list/process[1]/@xactid)[1]', 'NVarChar(15)') AS VictimXActID,
  
DeadlockList.Graphs.value('(process-list/process[1]/@clientapp)[1]', 'NVarChar(50)') AS VictimClientApp,
  
--Live
  
DeadlockList.Graphs.value('(process-list/process[2]/@spid)[1]', 'NVarChar(15)') AS LiveProcessID,
  
CAST(REPLACE(DeadlockList.Graphs.value('(pprocess-list/process[2]/@lastbatchstarted)[1]', 'NChar(23)'), N'T', N' ') AS DATETIME) AS LiveLastBatchStarted,
  
DeadlockList.Graphs.value('(process-list/process[2]/@lockMode)[1]', 'NVarChar(15)') AS LiveLockMode,
  
DeadlockList.Graphs.value('(process-list/process[2]/@xactid)[1]', 'NVarChar(15)') AS LiveXActID,
  
DeadlockList.Graphs.value('(process-list/process[2]/@clientapp)[1]', 'NVarChar(50)') AS LiveClientApp,
  
--Live resource.
  
DeadlockList.Graphs.value('(resource-list/pagelock[1]/@fileid)[1]', 'NVarChar(15)') AS LiveFileID,
  
DeadlockList.Graphs.value('(resource-list/pagelock[1]/@pageid)[1]', 'NVarChar(15)') AS LivePageID,
  
DeadlockList.Graphs.value('(resource-list/pagelock[1]/@objectname)[1]', 'NVarChar(50)') AS LiveObjName,
  
DeadlockList.Graphs.value('(resource-list/pagelock[1]/@mode)[1]', 'NVarChar(50)') AS LiveLockModeHeld,
  
DeadlockList.Graphs.value('(resource-list/pagelock[1]/waiter-list/waiter/@mode)[1]', 'NVarChar(50)') AS VictimLockModeRequest,    
  
--Victim resource.
  
DeadlockList.Graphs.value('(resource-list/pagelock[2]/@fileid)[1]', 'NVarChar(15)') AS VictimFileID,
  
DeadlockList.Graphs.value('(resource-list/pagelock[2]/@pageid)[1]', 'NVarChar(15)') AS VictimPageID,
  
DeadlockList.Graphs.value('(resource-list/pagelock[2]/@objectname)[1]', 'NVarChar(50)') AS VictimObjName,
  
DeadlockList.Graphs.value('(resource-list/pagelock[2]/@mode)[1]', 'NVarChar(50)') AS VictimLockModeHeld,  
  
DeadlockList.Graphs.value('(resource-list/pagelock[2]/waiter-list/waiter/@mode)[1]', 'NVarChar(50)') AS LiveLockModeRequest,
  
--Inputbuffers
  
DeadlockList.Graphs.value('(process-list/process[1]/executionStack/frame/@procname)[1]', 'NVarChar(100)') AS VictimProcName,
  
DeadlockList.Graphs.value('(process-list/process[1]/executionStack/frame)[1]', 'VarChar(max)') AS VictimExecStack,
  
DeadlockList.Graphs.value('(process-list/process[2]/executionStack/frame/@procname)[1]', 'NVarChar(max)') AS LiveProcName,
  
DeadlockList.Graphs.value('(process-list/process[2]/executionStack/frame)[1]', 'VarChar(max)') AS LiveExecStack,
  
RTRIM(LTRIM(REPLACE(DeadlockList.Graphs.value('(process-list/process[1]/inputbuf)[1]', 'NVarChar(2048)'), NCHAR(10), N''))) AS VictimInputBuffer,
RTrim(LTrim(Replace(DeadlockList.Graphs.value('
(process-list/process[2]/inputbuf)[1]', 'NVARCHAR(2048)'), NChar(10), N''))) AS LiveInputBuffer

FROM @DeadlockGraph.nodes('/deadlock-list/deadlock') AS DeadlockList(Graphs)
) AS [_ProcessDetails];

--  Display the parties to the deadlock.
SELECT *
FROM @PartiesToTheDeadlocks
ORDER BY VictimLastBatchStarted, VictimXActID;'

Now, the code formatter messed up the coloring, but the code works. 

Now, I’ll list the cols and explain them cause I know that’s the next thing you’re gonna want.  It’s broken up into 2 categories… the Victim process and the Live process.  I didn’t know what else to call it other than live and still keep it short enough to prefix the cols with.

VictimProcessID – spid of victim.

VictimLastBatchStarted – date/time of last batch for victim.

VictimLockMode – lock held by victim.

VictimXActID – trans ID for victim.

VictimClientApp – Victim client app.

LiveProcessID – spid of live process.

LiveLastBatchStarted – date/time of last batch for live process.

LiveLockMode – lock held by live process.

LiveXActID – trans ID within the batch for victim (I think).

LiveClientApp – client app for live process.

LiveFileID – file ID where the locked resource for live process is held.

LivePageID – page ID the processes are fighting over.  The live process currently holds the lock on this one.

LiveObjName – object the processes are fighting over.  The live process currently holds the lock on this one.

LiveLockModeHeld – lock held on the above object by the live process.

VictimLockModeRequest – lock being requested  on the above object by the victim.

VictimFileID – file ID where the object the victim process is holding is held.

VictimPageID – Page ID the processes are fighting over.  The victim currently holds the lock on this one.

VictimObjName – object the processes are fighting over.  the victim currently holds the lock on this one.

VictimLockModeHeld – lock held on the above object by the victim.

LiveLockModeRequest – lock being requested on the above object by the live process.

VictimProcName – the SP called by the victim.

VictimExecStack – the stmt in the SP that caused the deadlock.

LiveProcName – the SP called by the live process.

LiveExecStack – the stmt in the SP that caused the deadlock.

VictimInputBuffer – the exact SQL that was sent to SQL engine.

LiveInputBuffer – the exact SQL that was sent to SQL engine.

Now you can see at a glance where your problem deadlocks are, how often they’re occurring, and which resources are under the most contention.

Good stuff that.

A Lovely Statistics Query

Most of you know that you can set SQL to auto update stats.  And I’m sure that most of you know that it invalidates stats at about 20%.  What that means is that when about 20% of the data in the table has changed, the stats are invalidated.  It’s actually 500 + 20% if you wanna be specific.  And notice I say they’re invalidated and not updated.  Stats aren’t updated when that 20% mark is reached, they’re only invalidated.  They’re not rebuilt until they’re needed.  Well, for really large tables that can result in stats being oudated longer than they should because the amount of changes needed to invalidate them is much higher.  The same issue can occur with smaller tables where the changes are varied enough to throw the distribution out of whack.  Some tables are just more sensitive to change than others and that 20% may be too much.  You may find that you need to update stats more often… say at 10% or in extreme cases even 5%.

Here’s a sexy little query I put together that will give you the percentage of change an index has had.  You can use it to not only run a job to update stats on a different percentage, but you can also just query it and see what percentage your tables are at and how fast they change.  It’s an excellent tool for troubleshooting possible stats problems.  All the same, it’s pretty handy. 

–Get stats percentage of change for tables.

SELECT
so.SCHEMA_ID,
ss.NAME AS [Schema],
OBJECT_NAME(id) AS TableName, si.NAME AS IndexName,
CAST((CAST(rowmodctr AS float)/CAST(rowcnt AS float))*100 AS int)  AS Pct,
rowcnt,
STATS_DATE([id], indid) AS [StatsDate]
FROM sys.sysindexes si
INNER JOIN sys.objects so
ON si.id = so.object_id
INNER JOIN sys.schemas ss
ON so.schema_id = ss.schema_id
WHERE rowcnt >= 1000000--500
AND rowmodctr > 0
--AND OBJECT_NAME(id) = 'TableName'
ORDER BY Pct DESC, rowcnt DESC, rowmodctr

Here are a couple notes on the query:

1.  Notice there’s a way to query for a specific table.

2.  The StatsDate col shows you the last time the stats were updated.

3.  I’m also limiting the rowcount to indexes with more than 1mill rows.  Feel free to lower that if you like.

4.  rowmodctr > 0 is something I threw in there to keep the divide by zero error out of there.  This also filters out the system-created stats.

OK, I hope you guys like this one.  I personally love it.  It’s allowed me to build a process to update stats on some of my tables more aggressively than 20% and keep my server running at its peak.

Troubleshooting a replication scenario

I just had a nice replication troubleshooting experience I thought I’d share with you guys.  I got a call from our app team saying that replication was failing.  I looked and the error was a bit unexplainable.

“Column names in each table must be unique.”

OK, so why the hell is it telling me that?  This publication has been running for months now with no problems.  I checked the table in question on both sides, and as it turns out the column it was complaining about had and int data type on the publication side and a uniqueidentifier on the subscription side.  So how did these columns get different data types?  Did someone change the data type on the subscriber to an incompatible data type?  No, that’s probably not it.  And that’s not the error message you would get for something like that anyway.  But someone had to change the data type right?  Well, only kinda.  Here’s what actually happened. 

One of the devs decided he needed this column in the table, and instead of going to the DBAs, he added it to the subscriber himself.  He then added it to the publisher (with a different data type).  So when replication went to replicate the DDL change, it saw that the subscription side already had the column and spit out that excellent error message.

Let this be a lesson to those of you who have DBAs and you constantly look for ways to circumvent them.  Why don’t you stop every now and then and implement things properly.  And by properly I mean slow down and take the time to think about what you’re doing.  The days are over when you can just do whatever you want because it crosses your mind… esp when you have replication running.  You have to plan schema changes or you’ll spend tons of time troubleshooting and fixing your scenario.  Not to mention what it does to whatever processes rely on this replication process.  Anyway, that’s all I’ve got for now.  Let’s all be safe in production.

A Round of Patching

I just finished an interesting email thread with a user that’s not at all an uncommon scenario.  They were just installing a CU to some of their SQL boxes, and there was evidently a new bug that was introduced into that CU that causes their SSRS reports to rendor incorrectly.  The sad thing is that this is the kind of thing that should have been caught in testing.  Now of course, I don’t expect them to find everything during testing, but the simple rendering of reports should have caught early on.  Because the second they turned their reporting structure back on, report rendering started messing up.  Now, he assures me that they tested it, but what exactly did they test?  Did they do any user testing at all or did they just test the install of the patch itself?  I see that quite often.  I see shops claim to do patch testing, but all they’re concerned with is whether the install itself will fail.

Most of the time I blame managers and companies in general for his because the IT staff knows what’s at stake more than anyone but they’re not given the proper time and resources it takes to properly test.  Managers always claim to know the business needs better, yet they never seem to fully grasp the full implications of putting untested code into production.  All they want to know is will this bring down my server.  Unfortunately that always means something different to them than it does to us.  What they mean is will this fail and put us into an emergency situation?  And the answer is probably not.  However, disabling a critical piece of your business is putting you into an emergency situation.  And in this case they’re lucky it’s just report rendering.  What if it had been something more serious that caused some kind of logical corruption in their processing?

I would say that quite often the biggest problems caused by hotfixes are performance-related.  And by not testing a CU or hotfix properly you could bring your entire business to its knees.  You can slow your website down so much that nobody can get any real work done.  And what does that do for the reliability and confidence of your business?  It would be really nice sometimes if we in IT didn’t have to rely solely on people who know nothing about it to get our jobs done properly. 

And what’s amazing to me is that companies today still, after all this time and all the mistakes, don’t think it’s necessary to hire competent people code and test.  Oh sure, they perform lip service about it all day long, but what do they really do about it?  Do they really slow down their dev cycles enough to do benchmarking and architect solutions properly?  Do they spend the money necessary to ensure that their code and servers are solid by hiring enough competent people?  Because every shop I’ve been in works their people so hard there’s no time for anything unexpected to happen.  So I think I’ve said this before, but now I’m saying it again.  It’s time to stop being children about IT.  It’s time to stop letting people who know nothing about it make the technical decisions that effect entire projects.  And it’s definitely time to stop thinking we can get by without testing things properly.  And that includes functional testing as well as performance benchmarking.

Avoiding xp_cmdshell

This is gonna be a quick post, but I wanted to throw out a couple easy ways to avoid using xp_cmdshell.

Unfortunately this xp has gotten a bad rep, but only because it’s been abused.  If you use it properly there’s no reason why you can’t continue to get some good use out of it for many years to come.  Actually, I use it for my backup routines for a couple reasons… first, when I wrote them there really wasn’t a better choice.  And second, while I can architect a solution around it quite easily, it’s ridiculous to go so far out of my way to code around something that isn’t a huge problem.  The solution would be more complicated than the original problem, so why bother?  Oh yeah, and since I’m in a mixed shop there’s also that problem with having something like powershell on every box and I don’t wanna have to have a different routine for my SQL2K boxes, than I do for all the rest.  So using xp_cmdshell is still a good fit for me. 

Anyay though, there are some good reasons to avoid it when it’s not necessary (part of using it responsibly).  So we’re going to talk about 2 XPs today that will help you do just that.

First, there’s xp_FixedDrives.  This will report all the drives on your server and the amount of free space in MB.  Now, this can’t replace a well-formed PS script that can give you a lot more info, but it’s a nice start when you just need to look at drive space real quick.

Second, we have xp_create_subdir.  This handy little xp will let you create a directory.  You just pass it the path you want to create and you’re golden.  So it would look something like this:

xp_create_subdir ‘C:\Mydir\Anotherdir\theFinalDir’

And just like in DOS it’ll build the whole path for you so none of those dirs have to exist ahead of time.

Anyway, hope this finds someone well.

Trace flag 1807

This is something I haven’t had a use for in quite some time but it actually came in handy today.  Of course, this isn’t something you’d want to make a living doing, but you’ll love to keep it in your hat and pull it out every couple yrs or so when the situation arises.

This is all about solving problems.  Here’s the scenario.  A vendor is here and he’s trying to upgrade a DB schema.  The way the thing works is it basically copies the entire DB to new tables and then drops the old ones.  So at the apex of the operation the DB is about double in size.  The problem is that the server doesn’t have enough space to hold double the DB.  So what do you do?

Well, one of your options is to add space to the disk.  The problem there is that you can never just add space, right?  You have to provision it, etc.  So in this scenario this isn’t a good option at all.

Next you could find another SQL box on the same build and move the DBs over there.  You could then perform the upgrade on the other server, and then move the files back when you’re done.  Of course, the problem there is finding another box that doesn’t encroach on someone else’s production.  So you may or may not have this type of box lying around.

You could also think about just moving the files over to like a SAN or another server that has enough space and just attaching them across the wire.  So the way that would look would be for you to move the files to the other server (assume it doesn’t have SQL installed because you don’t need it), then attach the DB from the original server and point it back to the UNC path where the files are now.  Then once the conversion is done, all you have to do is move the files back to the original server and you’re golden.

But wait… there’s a problem.  If you attach files across the network, SQL will run a check that they’re not local and throw an error.  And even if map a drive you’ll get the same result.  So how can we get around SQL checking for network locations?  That’s where trace flag 1807 comes into play.  Just typing a simple DBCC TraceON (1807) and then attaching your files will disable the network check and allow you to attach your files.

Pretty cool huh?

And for those of you who like to see things in action, I’ve got this video of how this all works.

http://midnightdba.itbookworm.com/VidPages/Trace1807/Trace1807.aspx

My morning so far

Ok, aside from being kinda sick still, my morning has been filled with interesting issues.  Ahhh– the life of a prod DBA. 

It started today with space issues–again.  We’re just running out of space everywhere and it’s hard to even keep some DBs running because of the problems.  So I’m having to shrink some DBs just to keep them active. 

Now this one just gets me a little.  Had a vendor come up to me to ask for help.  He was on one of the prod servers and detached 3 of the DBs and couldn’t get them back.  Turns out he had moved the files and renamed them and expected SQL to know where they were.  He said he had problems initially and that’s why he did it, but he got stuck when he couldn’t detach them again to point SQL to the new files.  So I got that worked out for him with relatively little effort.  

Now this next one is just interesting.  I just switched our backup routine on our big system to backup to 20 files.  So the dev downstairs had a routine to restore it to a different server (I didn’t know that) and his routine was bombing.  He had re-written it to use the 20 files, but it was still failing.  Now, I’ll say that doing it the way he did doesn’t make him dumb.  In fact, I could very easily see anyone making a similar mistake because to someone who doesn’t know backup intimately, it seems like the kind of thing you should be able to do.  What he did was he was building a dynamic string to hold the file name values.  So in the string he was saying something like this: 

  

SET @BackupDatabaseFileNamePath = ‘DISK = N’ + ”” + ‘\\’ + @ProdIP + ‘\’ + LEFT(@BackupDatabaseFileNamePath,1) + ‘$’ + 

RIGHT(@BackupDatabaseFileNamePath,(LEN(@BackupDatabaseFileNamePath)-2)) + ””  

And so to that end, he was ending up with a string that looked like this: 

DISK = ‘\\127.0.0.1\K$\SQLBackups\ServerName\DBName\01of20FullPRD20101102010001.BAK’, 

And he did that 20 times, once for each file.  So now his actual restore command looked like this: 

  DECLARE @file VARCHAR(100) 

SET @file = ‘c:\SSISPkgMgmt.bak’ 

RESTORE DATABASE SSISPkgMgmt 

FROM @file 

WITH replace 

RESTORE DATABASE PRD 

FROM @BackupDatabaseFileNamePathInc 

And that looks like it should work because when you print it, you wind up with a perfect backup cmd.  The problem is that the backup cmd doesn’t work like that.  It takes params, or flags if you will, right?  And one of those flags is ‘DISK =’.  That means that the flag itself is ‘DISK =’, not a string that contains that same text.  It’s a subtle difference to us, but not to the backup cmd.  So if you want to build a string like that for either a backup or a restore, then you have to build a string that contains the entire backup cmd and not just a single part that includes the params. 

Here’s an example of something you can do though: 

DECLARE @file VARCHAR(100

SET @file = ‘c:\SSISPkgMgmt.bak’ 

RESTORE DATABASE SSISPkgMgmt 

FROM @file 

WITH replace 

And what he was trying to do was this: 

DECLARE @file VARCHAR(100)
SET @file = ‘DISK = ”c:\SSISPkgMgmt.bak”’
 
RESTORE DATABASE SSISPkgMgmt
FROM @file
WITH replace

 
If you run something like this you’ll see that SQL views it as a backup device because that follows the restore syntax.

So anyway, big fun again today too.

Even MVPs make mistakes

We’re in the middle of our last mock go-live before this weekend’s prod change-over.  We’re using an SRDF process to move the DB files from the current prod to the new prod.  This is our big SQL R2 cluster and what usually happens is that they present the drives to the cluster and then I run my script to bring the DBs back online.

Well what happened this time is that the drives weren’t added as SQL dependencies and therefore SQL couldn’t see them in order to bring the drives back online.  Well, I didn’t think to check that.  Instead what I did was I just deleted the DBs thinking that just starting from scratch would be the best way to go.  What ended up happening though is that SQL deleted the newly presented data files.  So the moral of this story is that even though SQL can’t see the drives to be able to attach the files, it can apparently see them well enough to delete them behind your back.  Thanks clustering team!

And so this isn’t entirely just a complaining session, here’s the query you can use to see which drives your clustered instance of SQL can see.

SELECT * FROM sys.dm_io_cluster_shared_drives

Now I’ve you’ve got drives in the cluster that SQL can’t see, all you have to do is add them as a dependency to the SQL service and you’ll be fine.  And in Windows 2008 you can do that while the service is online, but in 2003 and below you have to take SQL offline to add them.

Oh, and just for completion, you can tell which nodes are in your cluster by using this:

SELECT * FROM sys.dm_os_cluster_nodes

Here’s a quick update:  The errors I caused and fixed this morning had some residual problems.  Because once you make a mistake and try to fix it, you’ve already got mistakes on your mind and you’re not thinking clearly.  This is why I’m always advocating having restore scripts ready to go in a manner that you don’t have to think about it in a crisis.  You always want things to be as easy as possible.  And this is also why I’m always saying that you should have standardized troubleshooting scripts and everyone should know how to use them.  You don’t want individual DBAs inventing new methods on the fly.  You should all know what scripts are being run to troubleshoot if at all possible.

Post-install FUN!

This’ll be a quick one, but I just wanted to throw you guys another useful script real quick.  Now, whether or not you decide to make this work in PS or not is up to you.  Most of my post-install stuff is  still in T-SQL so that’s how I’m giving this to you.  Anyway, it’s something that gets overlooked quite often so I thought I’d just throw it up here real quick.

There are 2 things I like to do right away when I install SQL.  One is to set the job history max rows and the max rows per job, and the other is to set the number of error logs SQL will keep before deleting them. 

So this first one configs the job logging.  I like to keep a lot of logs for my jobs.  I’ve found that more often than not the default setting isn’t adequate if you want a real picture of what’s happening with your jobs.  So I increase it right off the bat.

 USE [msdb]
GO
EXEC msdb.dbo.sp_set_sqlagent_properties @jobhistory_max_rows=100000,
      
@jobhistory_max_rows_per_job=10000
GO

 

This 2nd script is what I use to set how many of the logs get stored.  Of course, you could also use a simple PS script to save the logs off to a different folder so you have them on hand if you like, but that’s up to you.

USE [master]
GO
EXEC xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'Software\Microsoft\MSSQLServer\MSSQLServer', N'NumErrorLogs', REG_DWORD, 30
GO

The thing I really like about both of these scripts is that they work on SQL2K and up.  I’ve tested these all the way up to R2 and they’re both just fine.  So if you don’t have some standardized setup scripts you run on all your boxes when you install them, then let these be your first.  The more things are the same in your shop the easier it’ll be to admin things.  You’ll know what’s happening on your boxes and you have fewer things to remember.  And scripting becomes much easier too because if things run the same way and are named the same thing, then you know exactly what to look for in your scripts. 

Have fun.