All posts by Sean McCown

I am a Contributing Editor for InfoWorld Magazine, and a frequent contributor to SQLServerCentral.com as well as SSWUG.org. I live with my wife and 3 kids, and have practiced and taught Kenpo for 22yrs now.

Why can’t voodoo be real?

I had a very interesting talk with a vendor this morning.  Ok, it wasn’t really interesting as much as maddening.  I swear, it’s guys like this that are going to be the death of me.  I know I shouldn’t let them get to me but sometimes they do.  You know how it goes, you try to explain something and it just escalates until you’re upset.  Anyway…

OK, so they were having blocking issues in the DB.  The vendor was trying to convince the app team to turn off the SQL backups because they were interfering with their processes.  That’s when they called me.  Frankly, I can’t believe they called because the vendor was very adamant and this is the kind of thing our app groups do on their own without consulting us at all, so I’m more shocked than anything.  But I knew this was going to be a bad call the second my app guy told me what was going on.

Here’s the conversation as it took place just a couple hrs ago.  It took me this long to regain my strength.

V = vendor

M = me

M: Ok, I’m here now.  So explain to me what’s going on.

V: Well, we’re having severe blocking issues in the DB and after looking at it we noticed it’s the SQL backups causing it.  So what we need to do is stop the SQL backups.

M: Backups don’t hold locks in the DB so there’s no way for it to be causing blocking.

V: Of course backups hold locks.  Everything holds locks.

M: No, they don’t hold locks.  There’s never been a case where a backup has blocked a process.

V: We see it all the time.  And you’ll just have to trust me because I do this for a living.  I’m very familiar with this app and this is how we always fix this issue.

M: You always fix your poor coding issues by having people not backup their data?

V: You can backup your data, but you have to schedule downtime every week to do it… to keep it from blocking.

M: I really don’t want to finish this conversation, but I’m developing this morbid curiosity about where this is leading.

V: What does that mean?

M: Well, you clearly know nothing about SQL and you’re doing your customers such a huge disservice by making them turn off their backups when all you have to do is fix your query.

V: What do you mean I know nothing about SQL, you’re the one who thinks backups don’t cause blocks when I see that all the time.

M: No, what you see all the time is your process causing blocks and people running backups.  Because I’m on your server now and I can see the blocking you’re referring to and it has nothing to do with the log backup, it has to do with your process on spid 201.

V: I thought you were a DBA.  When there’s severe blocking in a system SQL Server sometimes reports the wrong spid.  Our tech lead assures us that this is coming from the backup.

M: Well, if you’re just going to insult me we can end this call right now because I really don’t have time for this.  I can’t teach every moron who writes an app for SQL everything they need to know to support it.

V: Now who’s being insulting?

M: Oh, that would still be you.  Because you know nothing about SQL and you hop on the phone and start calling me an idiot because you don’t know how to troubleshoot a simple blocked process.  Since we’ve been on the phone, I’ve looked at the sp that’s causing this and the problem is simple; you’ve got a hardcoded TABLOCK in 2 of your queries.  That’s what causing the blocking.  Have you ever bothered looking at this process to see what it’s doing?

V:  No, because it’s the backup causing it.  There’s nothing wrong with the code.  We run this at several customer sites and nobody ever complains.

M: Ok, let me try something different here… if SQL’s reporting a different spid for the blocking process, how does your tech lead know that this incorrect spid points back to the backup?  Why couldn’t it be pointing to one of the many other processes running right now?

V: He’s got a script that can tie them together.

M: I would love to see that script.  Maybe I could learn something, huh?  But if SQL itself gets confused about the spid and doesn’t report it right in sysprocesses, then how can he tie it to the right process in his script?  Wouldn’t SQL tie it to the right spid and present it to us the right way to begin with?

V: He won’t show anyone the script because he says it’s too advanced and we don’t need to know how to do those types of things. 

M: Wow, that’s convenient isn’t it?  Let me try something different here.  Because now this is like a train wreck I just can’t turn away from.  When you kill the backups at other client sites, does the process clear up right away?

V: No, unfortunately, the database can’t recover from something that’s been blocked by a backup so we have to kill our process to and then things work fine.

M: My, you’ve just built your own little world in there haven’t you?  You’ve got excuses for everything.

V:  They’re not excuses.  It’s how databases work.  Where did they find you anyway?  Have you ever been to any SQL Server classes? Maybe they should send you to some.

M: No, I’ve never been to any SQL classes.  I’ve never needed them.  And if you call me an idiot again you’re gonna find out what else I can do.  In fact, I’m a hair away from calling the CEO of your company and letting him know what kind of stupid, snide little ass he’s got working for him.  So you’d better watch your tone with me boy and remember I”M the customer.  So you don’t tell me what to do and you don’t get on the phone with me and call me an idiot.  And here’s something else for you.  Before I even got on this call I ripped your access from the box because I don’t want you touching anything on our server.  Nobody’s turning off our backups, and nobody’s ever touching our server.  And I’m actually done with you.  I’m not going to troubleshoot this with you one more second.  Here’s what I want.  I want you to escalate this to your team lead and I’ll fight this out with him.  I want a call with him at the next available opportunity.  You are out of this issue.

V: Look, I’m sorry if you thought I was being

S: Shut up.  I’m done with you.  The next call I get had better be from your team lead.  Good-bye.

I’m telling you guys, if voodoo were real I’d be sticking pins in this guy’s ass for the next month.  And the sad thing is I doubt it’s his fault.  He’s got some jackass yanking his chain and this is probably just some kid who they’ve taught to be this way.  He sounded like he was probably in his early 20’s.  I still haven’t heard from his tech lead, btw.

DAS isn’t a curse word

I deal with this from time to time where users (and admins too unfortunately) will not only shy away from DAS, but actually act like it’s some kind of computer virus.  Now that SAN is here and is so ubiquitous nobody thinks DAS is worth anything anymore.  I’m fully aware of all the arguments, but remember DAS gave us some really nice control over our disk layouts and we didn’t have to keep going to the SAN guys to ask them for dedicated spindles, which is something they hate to give.  And that’s really something to be said for a DAS isn’t it?  We don’t have to worry about our DAS spindles being shared with 10 other servers. 

So whenever someone talks about DAS vs SAN, I always try to insert some reason into the debate.  SAN is good for what it’s good at, but so is DAS.  And there are often times I miss the DAS days.  I never had to fight with a SAN guy about whether the SAN can handle all of the disks being on the same physical partition.  I never had to try to explain to them that a DB isn’t a file system and you can’t just put everything on the same chunk of disks with 10 other servers.  I never had to get special permission to have a different RAID level put on the server.  But don’t get me wrong, I’m not anti SAN.  I just don’t think people give DAS its due credit anymore.  For a lot of systems in your shop, a couple DAS cabinets on each server can ease some of your I/O trouble.  And of course, that’s at the cost of management for the disk guys, but everything comes at a cost.

Dealing with superstition

To be a DBA for any real length of time takes a certain type of personality.  For starters it helps if you’re fairly strong-willed and confident.  In order to deal with a lot of the customers you find yourself coming in contact with it really helps if you can stand your ground.

This is the exact type of guy that we’ve had as the past 3 DBAs before I got here.  The problem is that they were apparently pretty dumb so they spread so much misinformation about SQL and about DBs in general it’s a constant battle for me to undo it all.  Like I said, these guys were dumb, not malicious so I really don’t think they meant to  spread that much bad information… I think they just didn’t know how to pick up a book.

So this situation causes a special problem for me because I’m trying really hard to do things right here but a lot of my customers and vendors just won’t let me.  “Well, the last guy told us this is how it works”.  Yeah, I bet he did, but that’s not right.  And I can tell them it’s not right 100x, but they’ve heard the same tired story for years now so they’re not likely to believe me even after I prove it to them.  And we all know that the guy who says it the loudest is typically believed.  And I’ve heard that our previous DBAs were all quite passionate that things be done a certain way and they really went to bat for what they wanted.  What’s funny is they went to bat for ridiculous things, and then let a lot of really big things slide… but that’s another post isn’t it?

The only thing I can hope for at this point is to stick around longer than the other DBAs did and eventually my rhetoric will start to sink its way into their heads.  I’m thinking one way to do it is to just flatout lie.  Just tell them, yeah, that’s the way it used to be but on this new version they’ve fixed all that.  Something tells me that’ll work out much better than just telling them the last guys were idiots.  Because if I keep doing that, then I might as well be trying to talk the Vatican out of religion.

Are you kidding me?

This one has me going so much I had to blog for a 3rd time today.  I keep finding boxes that have failing backups and other processes.  These jobs have been stopped for weeks in some cases.  And why you ask?  Hah, that’s a good question.  The reason why they keep failing my most learned internet peeps is because the previous DBA was running them under his own personal acct.  He had scads of jobs and other processes running under his acct… as did the DBA before him and the DBA before him. 

Are you people kidding me with this crap?  Ok, for those of you who don’t know enough to be sure why I’m upset, you never run anything under your own acct… or anyone else’s acct for that matter.  Even if you don’t leave this gig, you’ll still be forced to change your password at some point and there’s no way you can keep up with every job, service, ssis pkg, script, website, credential, etc that you’ve got running under your acct.  And then you’ve gotta get everyone else involved to help you find it because you keep getting locked out every 5mins.

When IT was young you could excuse these things because we were still figuring things out then, but there’s just no excuse for this level of idiocy anymore.  Come on guys, straighten up.

Why won’t the log shrink?

I know that everybody always says you shouldn’t shrink your log files, but lets face it, sometimes it really is necessary.  Maybe you had a big op that was out of the norm and it grew your log bigger than it usually runs, or maybe (and more common) your backups were failing for a few days before you noticed and the log is now huge and you need to get it back down to a more manageable size.  For whatever reason, you need to shrink your log.  Now, I’m gonna show you the T-SQL way because it’s what I’ve got handy and I’ve used this script for years.

So here’s the script and then we’ll discuss a couple things… because the script isn’t the meat of this post.  But this one will do all DBs on the server, so just use the logic if you don’t like the cursor.  And it has no effect on DBs that can’t have their logs shrunk so don’t worry about causing damage.

Declare @curDBName sysname
OK, so that’s fairly standard admin stuff.  Now, If you wanna check that you’ve had the effect you expected, you can run this command both before and after to make sure you’ve shrunk it.

Declare DBName Cursor For

Select Name from sysDatabases

Open DBName

Fetch Next From DBName into @curDBName

while @@Fetch_Status = 0

Begin

DBCC ShrinkDataBase (@curDBName , TRUNCATEONLY)

Fetch Next From DBName into @curDBName

END

Close DBName

DeAllocate DBName

DBCC SQLPERF(LOGSPACE)

And what you’re going to get from that is a result set that gives you the DBName, the log file size, and the % used.  What you want to look at is the log file size and the %Used.  If the size of the file is really big and the %Used is really small, that means your log file is a lot bigger than it needs to be.  So an example would be you’ve got a log file that’s 52000MB and 3% used.  That’s a lot of empty space.  So you’ve definitely got to use the above cmd to shrink it. 

But what happens if you shrink it with that cmd and it doesn’t shrink?  Or maybe it just shrinks a little bit?  Well, now you’ve got to troubleshoot why it’s not shrinking.  This is one of my standard interview questions and you’d be surprised how many people get it wrong.  As a matter of fact, I’ve never had anybody get it right.  So here’s what you do… pick up a book and become a DBA.  Ok, seriously though, it’s extremely easy.

Run this query in master to see what your log may be waiting on to truncate.

SELECT Name, log_reuse_wait_desc FROM sys.databases

What you’ll get here is a result that tells you what the log is waiting for before it can reuse the VLFs, and in this case, kill them so you can shrink the file.  There are any number of things that can show up here as the reason.  You could be waiting on a long transaction, or on a log backup, or on replication, etc.  So next time you have a problem getting a log to shrink, come here first and find out what it’s waiting for before you go guessing what the problem might be.

That’s all I’ve got.

[EDIT]
Ok, I got a comment below that says BOL states that TRUNCATEONLY only works for data files. I checked and he’s right, BOL does say that. However, I’ve been using this method for years and I just ran a test on it individually using the code I pasted out of this post so BOL is wrong on this one. I ran my test on SQL2K8 but I don’t see any reason why it would change significantly with a lower version.

Audit service accounts with Powershell

I thought I’d kick off the week with a useful little PS script I wrote last week.  It happens from time to time where you need to find out what AD accts your SQL boxes are running on.  It could be that you simply want to see how diverse they are or maybe you want to change the password of an acct and you want to see what the impact will be.  All the same, it can be useful to audit the startup accts for your SQL services. 

So here’s a little script you can use to get that done.  This again is part of a much larger process I have, but I’ve pulled it out and simplified it to its base componets for you.  I’ve also added the DB side of the process for completion. 

Here’s the PS piece:
$Server = ‘localhost’
$StatsDB = ‘DBStats’                  
 $SourceSQLScript1 = “\\Server1\F$\SQLServerDBA\Scripts\Collector\SQLQueries\AllSQLBoxes.txt”; 

$SqlCmd1 = Invoke-Sqlcmd -ServerInstance $Server -Database $StatsDB -inputfile $SourceSQLScript1
$SqlCmd1 | %  {
        $_.ServerName; ####Print ServerName
        [System.Int32]$ID = $_.InstanceID;
        [System.String]$ServerName = $_.ServerName;
        $ExecutionDateTime = Get-Date -Format “%M/%d/%y %H:m:ss”; 

$a = Get-wmiobject win32_service -ComputerName $ServerName -filter “DisplayName like ‘%sql%'” -ErrorVariable err -ErrorAction SilentlyContinue 

$a | % { $DisplayName = $_.DisplayName;
   $StartName = $_.StartName; 

 Invoke-Sqlcmd -ServerInstance $Server -Database $StatsDB -Query “Collector.spServiceAcctInsert ‘$ExecutionDateTime’,’$ID’,’$DisplayName’,’$StartName'”
}

Most of this is the same as the other scripts I’ve posted so I’m not really gonna go into every line like I have in the past.  I am going to explain the params for the invoke-sqlcmd line though.  In this line I’m just calling an SP that I’ve put in place to log these details to a table and you can use as much or as little of this as you wish.  The $ID param is the ID of the server.  In my complete solution I have a server table and I log things to other tables using the ServerID that comes from that table.  You’re free however to just capture the ServerName here instead.  But you notice that in the query, I get the ID and the ServerName and I use those in the script.  I’m just presenting you with how I do it and you can change it. 

Here’s the SP that I use in the DB that gets called by this script.  It’s very simple. 

CREATE procedure [Collector].[spServiceAcctInsert]

@ExecutionDateTime datetime,

@InstanceID int,

@ServiceName varchar(50),

@StartName varchar(50)

 AS

 Insert dbo.ServiceAcct

Select

@ExecutionDateTime,

@InstanceID,

@ServiceName,

@StartName  

And now here’s the table that you’re logging to:

CREATE TABLE [dbo].[ServiceAcct](

      [ID] [int] IDENTITY(1,1) NOT NULL,

      [ExecutionDateTime] [datetime] NULL,

      [InstanceID] [int] NULL,

      [ServiceName] [varchar](50) NULL,

      [StartName] [varchar](50) NULL

And that’s all there is to it. From here, now that you’ve got all of your service accts and the AD accts associated with them, you can easily query this table to see the impact of changing a password or just see how bad your security is. This is useful info to have. You don’t have to run this kind of thing often though. In fact, it’s not a query that I have scheduled. I typically just run it when I want to know something specific, but you may have a reason to schedule it.

OK guys, that’s all I’ve got on that.  Enjoy powershelling.

Today’s Powershell Fun

I’m having big fun with PS today because I’m installing several new R2 boxes on Windows Server R2.  Where does PS come into play you ask?  Well, good question… I only wish I had a good answer.  Ok, I do.

In order to install SQL Server R2 on Windows 2008 R2 you have to first install the .NET Framework.  However, the SQL installer doesn’t do it for you.  It just tells you that you have to install it.  So the normal way is to go into the server manager and then to ‘roles’ and then ‘add a role’.  Once you’re in the add role wizard choose ‘Application Server’.  Here are some brief screens of the process.

As you can see I’ve already installed it on this server, but it’s pretty easy to figure out from here.  Just install the .NET Framework (it’ll actually already be checked for you) and you’re golden.  Now, that takes too much time for me, especially when I’m installing several boxes at once.  So what I like to do is call on powershell for this task.  I’m going to take some extra time here to show you a couple things, but the process once you get it down is much faster than the wizard.

First of all, you have to load the windows modules into your PS session.  To do it manually, you can open PS and type the following command:

> Import-Module ServerManager

Here’s a screen:

However, now you’ll be able to run the next command.

> Get-WindowsFeature

This will show you all the features available and which ones are already installed.  Here’s a screen of the output.

And of course, you’ll notice on mine, it’s already installed, but it’ll be unchecked on yours.  OK, now you’re ready to do the install itself.

And for that, you need yet another command, but don’t worry, it’s easy. 

> Add-WindowsFeature AS-NET-Framework

Notice I put an arror by the name of the feature above that you’ll feed to the Add-WindowsFeature cmdlet.  So in case you didn’t catch it, the syntax is simply the cmdlet and the name of the feature you want to install. 

Here’s a screen of the line as it’ll look when you type it.

And here’s what it looks like while it’s installing:

And that’s it guys.  The install is actually much quicker from the cmdline and you don’t have to worry about making a mistake through the GUI.  And of course you don’t have to go through all these steps every time.  Why not write a short script to do it for you?  It would look like this:

Import-Module ServerManager
Add-WindowsFeature AS-NET-Framework

That’s it.  That’s all it takes to install it from a script.  Now you can save that off to the network somewhere and just use it whenever you like. 

OK, that’s all I’ve got.

Oh y, one more thing… you don’t have to import the servermanager module every time you run this.  You can use the PS shell that loads all the windows modules for you.  You can find it here:

Ok, THAT’S all I’ve got.

Read my mind

Man, talk about your all time screw-ups!!

OK, here’s what happened.  I got a ticket the other day to truncate a table in a prod DB.  Simple enough, right?  So I wrote the guy and said, are you sure you want me to truncate this table?  He said, yeah, it’s taking up over half of our space.  I said ok, just give me a couple mins.  So I connected and truncated the table.

Well, then I get a call like 10mins later and the guy was frantic.  What happened to my data?  Well, you told me to truncate the table.  NOT ALL OF IT!!! So this is a valuable lesson isn’t it?  Always be sure what you’re asking for and never assume.  I of course didn’t assume anything.  We get tickets to truncate entire DBs even all the time so it’s not that big of a deal for us.  Anyway, just be aware that many users have a completely different database vocabulary than you do.

Is this industry fun or what?

Powershell to recycle error log

Here’s a really good one. I noticed that very few of my boxes had a job to recycle the sql error logs. This is important because unless specifically done, the error log will only recycle when the service is burped in some way. For those of you who aren’t up on the lingo, recycling the error log is when you close down one log and create a new one. The error logs are just txt files kept on the server. So recycling creates a new physical txt file. And the reason you care about this is because some servers don’t get bounced very often at all and the files can get quite large. I have one server with a 40GB error log. And when I try to open it, it sits there for like 2hrs and then finally dies cause there aren’t enough system resources to deal with a file that big. So you can see why you want to keep the size of the files down.

And like I said, the error log only recycles when you burp the service or do it manually. And the way you do it manually is by running this code:

master.dbo.sp_cycle_errorlog

But doing it manually isn’t good enough because it has to be done on a regular basis. I prefer to recycle my logs every day. It’s a good generic schedule and honestly I don’t ever remember needing anything more granular. OK, so a new log every day is what we’re after. That means we need to put this code in a job and have it run every day at the same time.

So ok, that set the stage for what we’re wanting to do here. Now we can get into what we actually wanna talk about… and that’s how we push our new job to all of our boxes. Now, I’m going to do this a specific way and I’m doing it like this for a reason. I’ll explain once I’m done.

Here are the steps I want you to take to prep for this venture.

1. Create a job to occur every day (Midnight is preferable) and include the above code in the job step.
2. Be sure to set the job owner to sa.
3. Script the job and save it as a file somewhere.
4. You’ll have to script the job for different versions of SQL. SQL2K doesn’t have job schedules so the script blows up. So at least have one for SQL2k and below, and one for 2K5 and above.

 Now here’s the powershell script you’ll run.

$SourceSQLScript1 = “\\Server01\F$\SQLServerDBA\Scripts\Collector\SQLQueries\ServerListAllSQL2000.txt”;

$SqlCmd1 = Invoke-Sqlcmd -ServerInstance $Server -Database $StatsDB -inputfile $SourceSQLScript1

$SqlCmd1 | % {

[System.String]$ServerName = $_.ServerName;

$ServerName;

invoke-sqlcmd -ServerInstance $ServerName -Database “MSDB” -InputFile “\\Server01\F$\SQLServerDBA\Scripts\ServerConfig\RecycleSQLErrorLogJOB-SQL2K.sql” -SuppressProviderContextWarning

}

Now I’m going to explain what’s going on here and why:

1. $SourceSQLScript1 — This is the var that holds the location of the txt file where I store the query for the list of servers to run against. Inside the file looks like this: select ServerID as InstanceID, ServerName from dbo.Servers (nolock) where IsSQL = 1 and SQLVersion = 2000. The reason I do it this way is because if I ever need to change the query, all of the scripts that call the query will automatically pick up the change. So putting the query in a file is a good thing.

2. $SqlCmd1 — here’s where I’m actually running the query in the step above and setting the results to this var. Now I’ve got an array of ServerNames I can cycle through to run my script on.

3. [System.String]$ServerName = $_.ServerName; — Here I’m just setting the current cursor iteration of the ServerName equal to a var ($ServerName). It just makes it easier to read and if something comes up where I need to make a minor adjustment to how the ServerName looks, then I can mess with the var and not have to touch the rest of the script. Again, reducing effects to the script if I need to make changes to the value. So it’s like later on down the line if I discover that I need to put [] around it or something like that. This could also be done in the query file, but it’s nice to have the option here as well should I need something special for a single script.

4. $ServerName; — Here I’m just printing out the ServerName so I can watch my script cycle through the boxes. It’s my way of knowing what’s going on. I hate that blinking cursor.

5. invoke-sqlcmd — This is where the real magic happens. I’m connecting to each box and running the job script I saved above.

Here are some finer points of the logic for why I did things the way I did:

1. I put the server list query into a var. This is to allow you (or me) to get the server list any way we like w/o disturbing the rest of the query. You could have a hardcoded list of serverNames, or have them in a file… whatever you like. Here’s a really good example.

Instead of this:
$SqlCmd1 = Invoke-Sqlcmd -ServerInstance $Server -Database $StatsDB -inputfile $SourceSQLScript1
You can have this:
$SqlCmd1 = “Server1”, “Server2”, “Server3”, “Server4”, “Server5”

 And the rest of the script stays the same because the cursor is run off of the var. It doesn’t care how that var gets populated.

 2. I chose to run this separately for each edition of SQL. So I ran it once for SQL2K, once for SQL2K5, etc. That’s because like I said there are differences in the job scripts for the different versions. I could have done it all in a single script, but I really don’t like coding that logic in there and it increases the size of the script because now I have to check for the version of SQL and make a decision before I can create the job. I have to point it to the right .sql script. That’s too much work when all I have to do is change the query file and the script file between runs. And this way, I can run them concurrently for each version simply by saving the script under different names and kicking them off at the same time.

 3. This one is my favorite because it’s what makes this so flexible. I chose to put the sql logic itself into a file and just call that file. The reason I did this is because not only is the sql logic too long for this type of script, but if I’ve got anything else I need to run against all my boxes, all I have to do is change the sql script I’m calling. I don’t have to make any changes at all to the rest of the script. So if I want to create or delete any specific users off of a group of boxes, all I have to do is put that code in a script and save it, and then point this script to it. Or anything else, right? This is really the bread and butter for powershell and multi-server mgmt.

So guys, take this boiler plate script and methodology and apply it to your own stuff. I’ve taken a lot of stuff out of the script for this blog because my scripts are part of a much larger solution I’ve developed and it’s not relevant to what you’ll be doing. But the script works as-is and you can add whatever you want to it. And now you’ve got a generic way to push T-SQL out to your boxes, and a generic way to populate the server list it goes against. What else do you need really?

The hidden update

I had a cool situation today that didn’t come to me right away so I thought I’d share it with all of you.  We’re going to be talking about the lovely deadlock today.  I can’t remember if I’ve ever talked about it before, but I am now so there…

OK, here’s the situation.  I was called over by our very sexy web team lead (George), who told me that they were having deadlock issues on the web portal DB.  Fine, I’ll put a server-side trace on it and see what comes up.  For this I used the Locks\DeadlockGraph.  Once I got my info back, I noticed that we had table1 and table 2 and table3 in the mix. Table1 was a delete against itself.  Then another session ran an update against Table2 joined to table3.  The problem is that the update was deadlocking with the delete and the delete was losing every time.  And also, why was the deadlock on table1?  The update doesn’t even touch table1. 

For starters, all the tables have an update trigger that pulls the inputbuffer and session info for the spid that ran the update.  It then puts this info in a log table.  I don’t know why.  Unfortunately that wasn’t the problem.  I checked and none of the tables turned out to be views either so that avenue was dead.  The problem was just a tiny bit buried, but I eventually found it.  There was another table in the mix… table4.  Table3 had an update cascade set on its FK to table4 and table4 had an FK back to table1.  AH-HA… there’s your connection.  Now, as well, there’s wasn’t an index on the FK col in table1, so it was doing a scan.  Nice huh? 

So my recommended fix was as follows:

1.  Get rid of the auditing update triggers.  If you really want to log the action then put that code in an SP and call it after your update is done, but not as part of the main transaction.  Yes, I’m aware of the very minute risks in this, but they’re far out-weighed by completing your transaction so much faster.

2.  Put an index on the table1 FK column.  This is probably going to give you the biggest bang for your buck.  If it’s not doing a table scan, then it’ll get in and out faster so there’ll be less chance of deadlocking with the delete.  I believe the delete is also searching on the same col so it would really be worthwhile.

3.  Use updlock on the update query. 

My whole plan here is to get these transactions as short as possible.  It’s not enough to have efficient queries because if you’ve got a bunch of other stuff in the transaction then you might as well be doing a table scan every time.  And I know that reading table1/table2/table3 and all that isn’t easy to follow, but hey, we’ve gotta scrub these things for the internet, right?  Really the whole point is that you have to dig sometimes to find the source of something.  I knew the deadlock was on that other table and I could see the index scan in the execution plan, but that table wasn’t listed in the query or in the trigger.  So it had to be coming from somewhere.  So the basic point of this is to remind you of some of the things that are possible so you can remember to check these types of things too.