A Lovely Statistics Query

Most of you know that you can set SQL to auto update stats.  And I’m sure that most of you know that it invalidates stats at about 20%.  What that means is that when about 20% of the data in the table has changed, the stats are invalidated.  It’s actually 500 + 20% if you wanna be specific.  And notice I say they’re invalidated and not updated.  Stats aren’t updated when that 20% mark is reached, they’re only invalidated.  They’re not rebuilt until they’re needed.  Well, for really large tables that can result in stats being oudated longer than they should because the amount of changes needed to invalidate them is much higher.  The same issue can occur with smaller tables where the changes are varied enough to throw the distribution out of whack.  Some tables are just more sensitive to change than others and that 20% may be too much.  You may find that you need to update stats more often… say at 10% or in extreme cases even 5%.

Here’s a sexy little query I put together that will give you the percentage of change an index has had.  You can use it to not only run a job to update stats on a different percentage, but you can also just query it and see what percentage your tables are at and how fast they change.  It’s an excellent tool for troubleshooting possible stats problems.  All the same, it’s pretty handy. 

–Get stats percentage of change for tables.

ss.NAME AS [Schema],
OBJECT_NAME(id) AS TableName, si.NAME AS IndexName,
CAST((CAST(rowmodctr AS float)/CAST(rowcnt AS float))*100 AS int)  AS Pct,
STATS_DATE([id], indid) AS [StatsDate]
FROM sys.sysindexes si
INNER JOIN sys.objects so
ON si.id = so.object_id
INNER JOIN sys.schemas ss
ON so.schema_id = ss.schema_id
WHERE rowcnt >= 1000000--500
AND rowmodctr > 0
--AND OBJECT_NAME(id) = 'TableName'
ORDER BY Pct DESC, rowcnt DESC, rowmodctr

Here are a couple notes on the query:

1.  Notice there’s a way to query for a specific table.

2.  The StatsDate col shows you the last time the stats were updated.

3.  I’m also limiting the rowcount to indexes with more than 1mill rows.  Feel free to lower that if you like.

4.  rowmodctr > 0 is something I threw in there to keep the divide by zero error out of there.  This also filters out the system-created stats.

OK, I hope you guys like this one.  I personally love it.  It’s allowed me to build a process to update stats on some of my tables more aggressively than 20% and keep my server running at its peak.

Troubleshooting a replication scenario

I just had a nice replication troubleshooting experience I thought I’d share with you guys.  I got a call from our app team saying that replication was failing.  I looked and the error was a bit unexplainable.

“Column names in each table must be unique.”

OK, so why the hell is it telling me that?  This publication has been running for months now with no problems.  I checked the table in question on both sides, and as it turns out the column it was complaining about had and int data type on the publication side and a uniqueidentifier on the subscription side.  So how did these columns get different data types?  Did someone change the data type on the subscriber to an incompatible data type?  No, that’s probably not it.  And that’s not the error message you would get for something like that anyway.  But someone had to change the data type right?  Well, only kinda.  Here’s what actually happened. 

One of the devs decided he needed this column in the table, and instead of going to the DBAs, he added it to the subscriber himself.  He then added it to the publisher (with a different data type).  So when replication went to replicate the DDL change, it saw that the subscription side already had the column and spit out that excellent error message.

Let this be a lesson to those of you who have DBAs and you constantly look for ways to circumvent them.  Why don’t you stop every now and then and implement things properly.  And by properly I mean slow down and take the time to think about what you’re doing.  The days are over when you can just do whatever you want because it crosses your mind… esp when you have replication running.  You have to plan schema changes or you’ll spend tons of time troubleshooting and fixing your scenario.  Not to mention what it does to whatever processes rely on this replication process.  Anyway, that’s all I’ve got for now.  Let’s all be safe in production.

Pinging SQL Server in Powershell

It quite often happens that there will be some kind of issue keeping your app from hitting the DB.  Sometimes the issue is with SQL itself, but most of the time there’s something else involved.  The problem is that when it’s reported through the error stack at the app level, it only says that the DB was unavailable and it’s hard to get users to understand that it’s just a generic error that means the app couldn’t talk to the DB for whatever reason.  That reason could be NIC, or cable, OS, switch, router, etc.  There are many reasons why an app wouldn’t be able to get to a SQL box.  And it’s made even worse if it’s intermitent. 

So a good way to handle this is to put yourself a ping script up that will actually query SQL.  Not check that the server is up, or the service is running, but that you can actually ping SQL itself and run a query.  I’ve done this for you in powershell.  It runs a simple select and then writes a result to a txt file.  I setup an agent job to run every 5secs and run this code from a dos cmd task.


add-pssnapin sqlservercmdletsnapin100

$date = get-date

$a = Invoke-Sqlcmd -ServerInstance Servername -Database master -Query “select @@servername” -ErrorAction silentlyContinue -ErrorVariable err
if ($err.count -eq 0) {$a = “OK”}
else {$a = “Failed”}

# if (!$a) {$b = “Failed”}
“$date  :  $a” | Out-File c:\VI2conn.txt -append

This kind of script can help in a number of ways, and depending on the level of troubleshooting you want to do, you can place this on different boxes.  So you can place it on the local SQL box to have it test itself, or in another data center to test a specific link, or just in the same subnet or vlan, or put it in the same subnet as the app and then in a couple different ones.  How you spread it around depends on what you’re looking for.  But this can be a tiebreaker because if the apps people insist that SQL is down because of a generic error message, you can tell them that 2 boxes from 2 other subnets were able to ping it every 5secs, but the one from the app’s subnet failed 6 times or whatever.  This way you can also show something solid to the network guys and make it easier for them to find the problem on their end. 

Now, this won’t tell you if it’s a switch, firewall, NIC, OS, or what, but it will tell you that it’s not a problem with SQL itself or the SQL box.  Of course, if a couple of them fail then it could still be the SQL box cause again, it won’t tell you that either, but it could be a tiebreaker in the right circumstance.

AND, just because I’m a nice guy here’s the job script too.  I did this in SQL2K8.



/****** Object: Job [SQLPing] Script Date: 02/04/2011 14:20:44 ******/




TRANSACTION @ReturnCode INT @ReturnCode = 0/****** Object: JobCategory [[Uncategorized (Local)]]] Script Date: 02/04/2011 14:20:44 ******/


NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]’ AND category_class=1)BEGIN


@ReturnCode = msdb.dbo.sp_add_category @class=N’JOB’, @type=N’LOCAL’, @name=N'[Uncategorized (Local)]’IF

(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEND













@ReturnCode = msdb.dbo.sp_add_job @job_name=N’SQLPing’, =1, =0, =0, =0, =0, =0, =N’No description available.’, =N'[Uncategorized (Local)]’, =N’sa’, @job_id = @jobId OUTPUT(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback/****** Object: Step [Ping] Script Date: 02/04/2011 14:20:45 ******/













@ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N’Ping’, =1, =0, =1, =0, =2, =0, =0, =0, =0, @subsystem=N’CmdExec’, =N’powershell “c:\SQLPing.ps1″‘, =0IF

(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEXEC

@ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1IF

(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEXEC













@ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N’Every 5secs’, =1, =4, =1, =2, =10, =0, =0, =20110204, =99991231, =0, =235959, =N’0b2e4594-92bc-438c-8311-d40076b53042′IF

(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackEXEC

@ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)’IF

(@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollbackCOMMIT


TRANSACTION EndSaveQuitWithRollback:





My morning so far

Ok, aside from being kinda sick still, my morning has been filled with interesting issues.  Ahhh– the life of a prod DBA. 

It started today with space issues–again.  We’re just running out of space everywhere and it’s hard to even keep some DBs running because of the problems.  So I’m having to shrink some DBs just to keep them active. 

Now this one just gets me a little.  Had a vendor come up to me to ask for help.  He was on one of the prod servers and detached 3 of the DBs and couldn’t get them back.  Turns out he had moved the files and renamed them and expected SQL to know where they were.  He said he had problems initially and that’s why he did it, but he got stuck when he couldn’t detach them again to point SQL to the new files.  So I got that worked out for him with relatively little effort.  

Now this next one is just interesting.  I just switched our backup routine on our big system to backup to 20 files.  So the dev downstairs had a routine to restore it to a different server (I didn’t know that) and his routine was bombing.  He had re-written it to use the 20 files, but it was still failing.  Now, I’ll say that doing it the way he did doesn’t make him dumb.  In fact, I could very easily see anyone making a similar mistake because to someone who doesn’t know backup intimately, it seems like the kind of thing you should be able to do.  What he did was he was building a dynamic string to hold the file name values.  So in the string he was saying something like this: 


SET @BackupDatabaseFileNamePath = ‘DISK = N’ + ”” + ‘\\’ + @ProdIP + ‘\’ + LEFT(@BackupDatabaseFileNamePath,1) + ‘$’ + 

RIGHT(@BackupDatabaseFileNamePath,(LEN(@BackupDatabaseFileNamePath)-2)) + ””  

And so to that end, he was ending up with a string that looked like this: 

DISK = ‘\\\K$\SQLBackups\ServerName\DBName\01of20FullPRD20101102010001.BAK’, 

And he did that 20 times, once for each file.  So now his actual restore command looked like this: 

  DECLARE @file VARCHAR(100) 

SET @file = ‘c:\SSISPkgMgmt.bak’ 


FROM @file 

WITH replace 


FROM @BackupDatabaseFileNamePathInc 

And that looks like it should work because when you print it, you wind up with a perfect backup cmd.  The problem is that the backup cmd doesn’t work like that.  It takes params, or flags if you will, right?  And one of those flags is ‘DISK =’.  That means that the flag itself is ‘DISK =’, not a string that contains that same text.  It’s a subtle difference to us, but not to the backup cmd.  So if you want to build a string like that for either a backup or a restore, then you have to build a string that contains the entire backup cmd and not just a single part that includes the params. 

Here’s an example of something you can do though: 


SET @file = ‘c:\SSISPkgMgmt.bak’ 


FROM @file 

WITH replace 

And what he was trying to do was this: 

SET @file = ‘DISK = ”c:\SSISPkgMgmt.bak”’
FROM @file
WITH replace

If you run something like this you’ll see that SQL views it as a backup device because that follows the restore syntax.

So anyway, big fun again today too.

My ridiculous day

Yeah, some days it just doesn’t pay to even try to do a good job.  Not only are my sinuses really giving me a raging headache today, but these are the ridiculous things I’ve been engaged in on top of it.

I had to tshoot an ssis pkg because it stopped working when I moved it to the dev box.  As it turns out the moron who wrote it put the values in the config file like I told him to, but he only put part of them in there.  The rest are still hardcoded in the pkg so of course it was failing.

I’m installing SQL R2 ent. on a VM with 1GB of RAM.

I turned AutoShrink back on for a DB because the business owner is scared to death it’ll blow something up if I don’t and they wanted to check with the vendor to make sure I wasn’t going to kill the DB.  Shoot me now.

I BCPd a couple large tables out to a different server and zipped them up.  I had to do this because the server is running out of space and the server team says they can’t get any more right now.  So in order to keep the DB running I’ve had to take a couple tables out of the DB so there’s room for normal ops.  This won’t end well.

Heard back from the support guy about the AutoShrink issue above.  He’s not sure but he’s pretty sure that a major change like that will void our support contract.  Really?  On the day my head is pounding so hard?  Consider yourself lucky this time.

Going to lunch soon.

Even MVPs make mistakes

We’re in the middle of our last mock go-live before this weekend’s prod change-over.  We’re using an SRDF process to move the DB files from the current prod to the new prod.  This is our big SQL R2 cluster and what usually happens is that they present the drives to the cluster and then I run my script to bring the DBs back online.

Well what happened this time is that the drives weren’t added as SQL dependencies and therefore SQL couldn’t see them in order to bring the drives back online.  Well, I didn’t think to check that.  Instead what I did was I just deleted the DBs thinking that just starting from scratch would be the best way to go.  What ended up happening though is that SQL deleted the newly presented data files.  So the moral of this story is that even though SQL can’t see the drives to be able to attach the files, it can apparently see them well enough to delete them behind your back.  Thanks clustering team!

And so this isn’t entirely just a complaining session, here’s the query you can use to see which drives your clustered instance of SQL can see.

SELECT * FROM sys.dm_io_cluster_shared_drives

Now I’ve you’ve got drives in the cluster that SQL can’t see, all you have to do is add them as a dependency to the SQL service and you’ll be fine.  And in Windows 2008 you can do that while the service is online, but in 2003 and below you have to take SQL offline to add them.

Oh, and just for completion, you can tell which nodes are in your cluster by using this:

SELECT * FROM sys.dm_os_cluster_nodes

Here’s a quick update:  The errors I caused and fixed this morning had some residual problems.  Because once you make a mistake and try to fix it, you’ve already got mistakes on your mind and you’re not thinking clearly.  This is why I’m always advocating having restore scripts ready to go in a manner that you don’t have to think about it in a crisis.  You always want things to be as easy as possible.  And this is also why I’m always saying that you should have standardized troubleshooting scripts and everyone should know how to use them.  You don’t want individual DBAs inventing new methods on the fly.  You should all know what scripts are being run to troubleshoot if at all possible.

What does a bad query look like?

In my SQL Marklar blog today I discussed troubleshooting DB processes.  And I’m not going to re-hash all of it here but I did want to tell you about a use case that describes perfectly what I was talking about.

Not so long ago I got a call from one team and they told me that they had some server issues.  Everything moving slow they said.  Ok, so I got on and took a look and nothing was really jumping out at me.  Then I put a profiler trace on it to see if anything jumped out at me.  And of course, I knew nothing about the app or the processes so I really didn’t know what I was looking for, but you’ve gotta start somewhere huh?

So there I am in profiler and I’m just looking for long-running queries.  The problem is there were lots of queries I would consider long-running.  For some reason I focused in on a single SP that was taking like 5mins.  I pulled up the text of the SP and started looking through it.  It all seemed fairly standard.  I mean, it was long and everything wasn’t perfect, but there was nothing out of the ordinary. 

I contacted the app guy again and asked about it.  Does this SP typically take this long to run?  No, he says (and those of you who have seen My Cousin Vinny know where this is going).  So I thought eureka, I actually found something that may fix the issue.  So I got a couple valid params from him and ran the SP with them.  I also made sure to turn on execution plans and statistics io.  The query plan had some dings in it that experience has told me could easily have caused this kind of spike in resource usage.  The problem is that there was no fragmentation, and stats were up to date.  And in talking with the app guy he told me that they just archived a bunch of the data so it was down to like 200mill rows now.  So why would this thing be taking so long to return?  Moving on.

I found a copy of his QA system that had been copied over from prod the previous week and he assured me that they had changed nothing.  I could see the extra rows in the tables (copied before the archival), and the indexes were the same as in prod so that wasn’t the issue.  They had the same fill factor, everything.  In fact, everything I checked was identical except for the amount of data.  So why would having less data cause such a huge performance issue?  Moving on.

I decided that running this thing again and again on prod was probably a bad idea.  I’m just adding to the issue.  So I started doing the rest of my work on his QA box where I was the only spid.  And the hardware was similar as well (I love it when it works out that way).  So I ran the SP on this box and 5mins passed.  Then 10mins.  Then 15mins.  Then 20mins.  And sometime soon after that, the query returned.  I had collected all my stats along the way so I was golden.  It was getting the same execution plan as the prod version.  The results aren’t what I expected at all.  Why is the prod version now performing well in comparison?  So I called the app guy again and explained the situation.  Here’s more or less how the conversation went:

Me:  You know, I just don’t know what’s going on here.  I’ve been looking at this for a long time now and I’m getting further into a hole.  The prod version of this SP takes 5mins, and that’s even after the archival.  But when I run it on QA with all the data there it takes even longer.  If the prod query is acting up then I would expect the QA query to be a shorter time even with the extra data.

Guy:  Yeah that sounds about right to me.

Me:  What sounds right to you?  (I just got a bad feeling that something horrible had gone wrong)  (You know how you can instantly drop all the pieces into place and remember key words that make everything all of a sudden fit together?  Well, I just got that, but I wanted to hear him say it.)

Guy:  This SP usually takes about that much time, but since the archival it went down to 5mins.  We’ve been very pleased.

Me:  So you mean to tell me that when I came to you with this you didn’t find it necessary to tell me that the 5mins was an improvement?

Guy:  Well, I don’t know anything about DBs so I figured you could see that kinda thing already.

Me:  I see.  Well that clears up that mystery.  Now I’ve gotta go back and start over with this whole process.

Guy:  Well I can tell you the one that’s probably causing the issue.

Me:  Oh yeah?  How’s that?

Guy:  Because the slowness is only in our billing section and that’s controlled by just a few queries.  I can give you the names of the SPs and you can look at those.  There are only like 5 of them and since we’re having a hard time pulling up a list of clients it’s likely going to be the one that controls the search on that.

Me:  I see.  So you were just never going to tell me that?  I’ve been messing with this for 2hrs and you could have given me all this info before and saved me tons of time.

Guy:  Well, again, I don’t know anything about DBs and I figured you could see all that.

Me:  You thought I could see the web app from the DB?

Guy:  You can’t?

Me:  Kill me.

So ok, it turned out to be one of the 5 he gave me.  It had a bad query plan.  I wasn’t able to determine that all on my own, btw.  I had to recompile each one of them until I found the bad one.  And that’s because I didn’t have a perf baseline like I discussed on Marklar.

So there are a couple lessons to learn here but I think you can gleen them for yourself.  The whole point of this though is that making assumptions about processes is bad and no matter what looks like a long-running query to you, it may in fact be performing better than usual.

Why can’t voodoo be real?

I had a very interesting talk with a vendor this morning.  Ok, it wasn’t really interesting as much as maddening.  I swear, it’s guys like this that are going to be the death of me.  I know I shouldn’t let them get to me but sometimes they do.  You know how it goes, you try to explain something and it just escalates until you’re upset.  Anyway…

OK, so they were having blocking issues in the DB.  The vendor was trying to convince the app team to turn off the SQL backups because they were interfering with their processes.  That’s when they called me.  Frankly, I can’t believe they called because the vendor was very adamant and this is the kind of thing our app groups do on their own without consulting us at all, so I’m more shocked than anything.  But I knew this was going to be a bad call the second my app guy told me what was going on.

Here’s the conversation as it took place just a couple hrs ago.  It took me this long to regain my strength.

V = vendor

M = me

M: Ok, I’m here now.  So explain to me what’s going on.

V: Well, we’re having severe blocking issues in the DB and after looking at it we noticed it’s the SQL backups causing it.  So what we need to do is stop the SQL backups.

M: Backups don’t hold locks in the DB so there’s no way for it to be causing blocking.

V: Of course backups hold locks.  Everything holds locks.

M: No, they don’t hold locks.  There’s never been a case where a backup has blocked a process.

V: We see it all the time.  And you’ll just have to trust me because I do this for a living.  I’m very familiar with this app and this is how we always fix this issue.

M: You always fix your poor coding issues by having people not backup their data?

V: You can backup your data, but you have to schedule downtime every week to do it… to keep it from blocking.

M: I really don’t want to finish this conversation, but I’m developing this morbid curiosity about where this is leading.

V: What does that mean?

M: Well, you clearly know nothing about SQL and you’re doing your customers such a huge disservice by making them turn off their backups when all you have to do is fix your query.

V: What do you mean I know nothing about SQL, you’re the one who thinks backups don’t cause blocks when I see that all the time.

M: No, what you see all the time is your process causing blocks and people running backups.  Because I’m on your server now and I can see the blocking you’re referring to and it has nothing to do with the log backup, it has to do with your process on spid 201.

V: I thought you were a DBA.  When there’s severe blocking in a system SQL Server sometimes reports the wrong spid.  Our tech lead assures us that this is coming from the backup.

M: Well, if you’re just going to insult me we can end this call right now because I really don’t have time for this.  I can’t teach every moron who writes an app for SQL everything they need to know to support it.

V: Now who’s being insulting?

M: Oh, that would still be you.  Because you know nothing about SQL and you hop on the phone and start calling me an idiot because you don’t know how to troubleshoot a simple blocked process.  Since we’ve been on the phone, I’ve looked at the sp that’s causing this and the problem is simple; you’ve got a hardcoded TABLOCK in 2 of your queries.  That’s what causing the blocking.  Have you ever bothered looking at this process to see what it’s doing?

V:  No, because it’s the backup causing it.  There’s nothing wrong with the code.  We run this at several customer sites and nobody ever complains.

M: Ok, let me try something different here… if SQL’s reporting a different spid for the blocking process, how does your tech lead know that this incorrect spid points back to the backup?  Why couldn’t it be pointing to one of the many other processes running right now?

V: He’s got a script that can tie them together.

M: I would love to see that script.  Maybe I could learn something, huh?  But if SQL itself gets confused about the spid and doesn’t report it right in sysprocesses, then how can he tie it to the right process in his script?  Wouldn’t SQL tie it to the right spid and present it to us the right way to begin with?

V: He won’t show anyone the script because he says it’s too advanced and we don’t need to know how to do those types of things. 

M: Wow, that’s convenient isn’t it?  Let me try something different here.  Because now this is like a train wreck I just can’t turn away from.  When you kill the backups at other client sites, does the process clear up right away?

V: No, unfortunately, the database can’t recover from something that’s been blocked by a backup so we have to kill our process to and then things work fine.

M: My, you’ve just built your own little world in there haven’t you?  You’ve got excuses for everything.

V:  They’re not excuses.  It’s how databases work.  Where did they find you anyway?  Have you ever been to any SQL Server classes? Maybe they should send you to some.

M: No, I’ve never been to any SQL classes.  I’ve never needed them.  And if you call me an idiot again you’re gonna find out what else I can do.  In fact, I’m a hair away from calling the CEO of your company and letting him know what kind of stupid, snide little ass he’s got working for him.  So you’d better watch your tone with me boy and remember I”M the customer.  So you don’t tell me what to do and you don’t get on the phone with me and call me an idiot.  And here’s something else for you.  Before I even got on this call I ripped your access from the box because I don’t want you touching anything on our server.  Nobody’s turning off our backups, and nobody’s ever touching our server.  And I’m actually done with you.  I’m not going to troubleshoot this with you one more second.  Here’s what I want.  I want you to escalate this to your team lead and I’ll fight this out with him.  I want a call with him at the next available opportunity.  You are out of this issue.

V: Look, I’m sorry if you thought I was being

S: Shut up.  I’m done with you.  The next call I get had better be from your team lead.  Good-bye.

I’m telling you guys, if voodoo were real I’d be sticking pins in this guy’s ass for the next month.  And the sad thing is I doubt it’s his fault.  He’s got some jackass yanking his chain and this is probably just some kid who they’ve taught to be this way.  He sounded like he was probably in his early 20’s.  I still haven’t heard from his tech lead, btw.

Why won’t the log shrink?

I know that everybody always says you shouldn’t shrink your log files, but lets face it, sometimes it really is necessary.  Maybe you had a big op that was out of the norm and it grew your log bigger than it usually runs, or maybe (and more common) your backups were failing for a few days before you noticed and the log is now huge and you need to get it back down to a more manageable size.  For whatever reason, you need to shrink your log.  Now, I’m gonna show you the T-SQL way because it’s what I’ve got handy and I’ve used this script for years.

So here’s the script and then we’ll discuss a couple things… because the script isn’t the meat of this post.  But this one will do all DBs on the server, so just use the logic if you don’t like the cursor.  And it has no effect on DBs that can’t have their logs shrunk so don’t worry about causing damage.

Declare @curDBName sysname
OK, so that’s fairly standard admin stuff.  Now, If you wanna check that you’ve had the effect you expected, you can run this command both before and after to make sure you’ve shrunk it.

Declare DBName Cursor For

Select Name from sysDatabases

Open DBName

Fetch Next From DBName into @curDBName

while @@Fetch_Status = 0


DBCC ShrinkDataBase (@curDBName , TRUNCATEONLY)

Fetch Next From DBName into @curDBName


Close DBName

DeAllocate DBName


And what you’re going to get from that is a result set that gives you the DBName, the log file size, and the % used.  What you want to look at is the log file size and the %Used.  If the size of the file is really big and the %Used is really small, that means your log file is a lot bigger than it needs to be.  So an example would be you’ve got a log file that’s 52000MB and 3% used.  That’s a lot of empty space.  So you’ve definitely got to use the above cmd to shrink it. 

But what happens if you shrink it with that cmd and it doesn’t shrink?  Or maybe it just shrinks a little bit?  Well, now you’ve got to troubleshoot why it’s not shrinking.  This is one of my standard interview questions and you’d be surprised how many people get it wrong.  As a matter of fact, I’ve never had anybody get it right.  So here’s what you do… pick up a book and become a DBA.  Ok, seriously though, it’s extremely easy.

Run this query in master to see what your log may be waiting on to truncate.

SELECT Name, log_reuse_wait_desc FROM sys.databases

What you’ll get here is a result that tells you what the log is waiting for before it can reuse the VLFs, and in this case, kill them so you can shrink the file.  There are any number of things that can show up here as the reason.  You could be waiting on a long transaction, or on a log backup, or on replication, etc.  So next time you have a problem getting a log to shrink, come here first and find out what it’s waiting for before you go guessing what the problem might be.

That’s all I’ve got.

Ok, I got a comment below that says BOL states that TRUNCATEONLY only works for data files. I checked and he’s right, BOL does say that. However, I’ve been using this method for years and I just ran a test on it individually using the code I pasted out of this post so BOL is wrong on this one. I ran my test on SQL2K8 but I don’t see any reason why it would change significantly with a lower version.

The hidden update

I had a cool situation today that didn’t come to me right away so I thought I’d share it with all of you.  We’re going to be talking about the lovely deadlock today.  I can’t remember if I’ve ever talked about it before, but I am now so there…

OK, here’s the situation.  I was called over by our very sexy web team lead (George), who told me that they were having deadlock issues on the web portal DB.  Fine, I’ll put a server-side trace on it and see what comes up.  For this I used the Locks\DeadlockGraph.  Once I got my info back, I noticed that we had table1 and table 2 and table3 in the mix. Table1 was a delete against itself.  Then another session ran an update against Table2 joined to table3.  The problem is that the update was deadlocking with the delete and the delete was losing every time.  And also, why was the deadlock on table1?  The update doesn’t even touch table1. 

For starters, all the tables have an update trigger that pulls the inputbuffer and session info for the spid that ran the update.  It then puts this info in a log table.  I don’t know why.  Unfortunately that wasn’t the problem.  I checked and none of the tables turned out to be views either so that avenue was dead.  The problem was just a tiny bit buried, but I eventually found it.  There was another table in the mix… table4.  Table3 had an update cascade set on its FK to table4 and table4 had an FK back to table1.  AH-HA… there’s your connection.  Now, as well, there’s wasn’t an index on the FK col in table1, so it was doing a scan.  Nice huh? 

So my recommended fix was as follows:

1.  Get rid of the auditing update triggers.  If you really want to log the action then put that code in an SP and call it after your update is done, but not as part of the main transaction.  Yes, I’m aware of the very minute risks in this, but they’re far out-weighed by completing your transaction so much faster.

2.  Put an index on the table1 FK column.  This is probably going to give you the biggest bang for your buck.  If it’s not doing a table scan, then it’ll get in and out faster so there’ll be less chance of deadlocking with the delete.  I believe the delete is also searching on the same col so it would really be worthwhile.

3.  Use updlock on the update query. 

My whole plan here is to get these transactions as short as possible.  It’s not enough to have efficient queries because if you’ve got a bunch of other stuff in the transaction then you might as well be doing a table scan every time.  And I know that reading table1/table2/table3 and all that isn’t easy to follow, but hey, we’ve gotta scrub these things for the internet, right?  Really the whole point is that you have to dig sometimes to find the source of something.  I knew the deadlock was on that other table and I could see the index scan in the execution plan, but that table wasn’t listed in the query or in the trigger.  So it had to be coming from somewhere.  So the basic point of this is to remind you of some of the things that are possible so you can remember to check these types of things too.