Dealing with superstition

To be a DBA for any real length of time takes a certain type of personality.  For starters it helps if you’re fairly strong-willed and confident.  In order to deal with a lot of the customers you find yourself coming in contact with it really helps if you can stand your ground.

This is the exact type of guy that we’ve had as the past 3 DBAs before I got here.  The problem is that they were apparently pretty dumb so they spread so much misinformation about SQL and about DBs in general it’s a constant battle for me to undo it all.  Like I said, these guys were dumb, not malicious so I really don’t think they meant to  spread that much bad information… I think they just didn’t know how to pick up a book.

So this situation causes a special problem for me because I’m trying really hard to do things right here but a lot of my customers and vendors just won’t let me.  “Well, the last guy told us this is how it works”.  Yeah, I bet he did, but that’s not right.  And I can tell them it’s not right 100x, but they’ve heard the same tired story for years now so they’re not likely to believe me even after I prove it to them.  And we all know that the guy who says it the loudest is typically believed.  And I’ve heard that our previous DBAs were all quite passionate that things be done a certain way and they really went to bat for what they wanted.  What’s funny is they went to bat for ridiculous things, and then let a lot of really big things slide… but that’s another post isn’t it?

The only thing I can hope for at this point is to stick around longer than the other DBAs did and eventually my rhetoric will start to sink its way into their heads.  I’m thinking one way to do it is to just flatout lie.  Just tell them, yeah, that’s the way it used to be but on this new version they’ve fixed all that.  Something tells me that’ll work out much better than just telling them the last guys were idiots.  Because if I keep doing that, then I might as well be trying to talk the Vatican out of religion.

Are you kidding me?

This one has me going so much I had to blog for a 3rd time today.  I keep finding boxes that have failing backups and other processes.  These jobs have been stopped for weeks in some cases.  And why you ask?  Hah, that’s a good question.  The reason why they keep failing my most learned internet peeps is because the previous DBA was running them under his own personal acct.  He had scads of jobs and other processes running under his acct… as did the DBA before him and the DBA before him. 

Are you people kidding me with this crap?  Ok, for those of you who don’t know enough to be sure why I’m upset, you never run anything under your own acct… or anyone else’s acct for that matter.  Even if you don’t leave this gig, you’ll still be forced to change your password at some point and there’s no way you can keep up with every job, service, ssis pkg, script, website, credential, etc that you’ve got running under your acct.  And then you’ve gotta get everyone else involved to help you find it because you keep getting locked out every 5mins.

When IT was young you could excuse these things because we were still figuring things out then, but there’s just no excuse for this level of idiocy anymore.  Come on guys, straighten up.

Why won’t the log shrink?

I know that everybody always says you shouldn’t shrink your log files, but lets face it, sometimes it really is necessary.  Maybe you had a big op that was out of the norm and it grew your log bigger than it usually runs, or maybe (and more common) your backups were failing for a few days before you noticed and the log is now huge and you need to get it back down to a more manageable size.  For whatever reason, you need to shrink your log.  Now, I’m gonna show you the T-SQL way because it’s what I’ve got handy and I’ve used this script for years.

So here’s the script and then we’ll discuss a couple things… because the script isn’t the meat of this post.  But this one will do all DBs on the server, so just use the logic if you don’t like the cursor.  And it has no effect on DBs that can’t have their logs shrunk so don’t worry about causing damage.

Declare @curDBName sysname
OK, so that’s fairly standard admin stuff.  Now, If you wanna check that you’ve had the effect you expected, you can run this command both before and after to make sure you’ve shrunk it.

Declare DBName Cursor For

Select Name from sysDatabases

Open DBName

Fetch Next From DBName into @curDBName

while @@Fetch_Status = 0

Begin

DBCC ShrinkDataBase (@curDBName , TRUNCATEONLY)

Fetch Next From DBName into @curDBName

END

Close DBName

DeAllocate DBName

DBCC SQLPERF(LOGSPACE)

And what you’re going to get from that is a result set that gives you the DBName, the log file size, and the % used.  What you want to look at is the log file size and the %Used.  If the size of the file is really big and the %Used is really small, that means your log file is a lot bigger than it needs to be.  So an example would be you’ve got a log file that’s 52000MB and 3% used.  That’s a lot of empty space.  So you’ve definitely got to use the above cmd to shrink it. 

But what happens if you shrink it with that cmd and it doesn’t shrink?  Or maybe it just shrinks a little bit?  Well, now you’ve got to troubleshoot why it’s not shrinking.  This is one of my standard interview questions and you’d be surprised how many people get it wrong.  As a matter of fact, I’ve never had anybody get it right.  So here’s what you do… pick up a book and become a DBA.  Ok, seriously though, it’s extremely easy.

Run this query in master to see what your log may be waiting on to truncate.

SELECT Name, log_reuse_wait_desc FROM sys.databases

What you’ll get here is a result that tells you what the log is waiting for before it can reuse the VLFs, and in this case, kill them so you can shrink the file.  There are any number of things that can show up here as the reason.  You could be waiting on a long transaction, or on a log backup, or on replication, etc.  So next time you have a problem getting a log to shrink, come here first and find out what it’s waiting for before you go guessing what the problem might be.

That’s all I’ve got.

[EDIT]
Ok, I got a comment below that says BOL states that TRUNCATEONLY only works for data files. I checked and he’s right, BOL does say that. However, I’ve been using this method for years and I just ran a test on it individually using the code I pasted out of this post so BOL is wrong on this one. I ran my test on SQL2K8 but I don’t see any reason why it would change significantly with a lower version.

Audit service accounts with Powershell

I thought I’d kick off the week with a useful little PS script I wrote last week.  It happens from time to time where you need to find out what AD accts your SQL boxes are running on.  It could be that you simply want to see how diverse they are or maybe you want to change the password of an acct and you want to see what the impact will be.  All the same, it can be useful to audit the startup accts for your SQL services. 

So here’s a little script you can use to get that done.  This again is part of a much larger process I have, but I’ve pulled it out and simplified it to its base componets for you.  I’ve also added the DB side of the process for completion. 

Here’s the PS piece:
$Server = ‘localhost’
$StatsDB = ‘DBStats’                  
 $SourceSQLScript1 = “\\Server1\F$\SQLServerDBA\Scripts\Collector\SQLQueries\AllSQLBoxes.txt”; 

$SqlCmd1 = Invoke-Sqlcmd -ServerInstance $Server -Database $StatsDB -inputfile $SourceSQLScript1
$SqlCmd1 | %  {
        $_.ServerName; ####Print ServerName
        [System.Int32]$ID = $_.InstanceID;
        [System.String]$ServerName = $_.ServerName;
        $ExecutionDateTime = Get-Date -Format “%M/%d/%y %H:m:ss”; 

$a = Get-wmiobject win32_service -ComputerName $ServerName -filter “DisplayName like ‘%sql%'” -ErrorVariable err -ErrorAction SilentlyContinue 

$a | % { $DisplayName = $_.DisplayName;
   $StartName = $_.StartName; 

 Invoke-Sqlcmd -ServerInstance $Server -Database $StatsDB -Query “Collector.spServiceAcctInsert ‘$ExecutionDateTime’,’$ID’,’$DisplayName’,’$StartName'”
}

Most of this is the same as the other scripts I’ve posted so I’m not really gonna go into every line like I have in the past.  I am going to explain the params for the invoke-sqlcmd line though.  In this line I’m just calling an SP that I’ve put in place to log these details to a table and you can use as much or as little of this as you wish.  The $ID param is the ID of the server.  In my complete solution I have a server table and I log things to other tables using the ServerID that comes from that table.  You’re free however to just capture the ServerName here instead.  But you notice that in the query, I get the ID and the ServerName and I use those in the script.  I’m just presenting you with how I do it and you can change it. 

Here’s the SP that I use in the DB that gets called by this script.  It’s very simple. 

CREATE procedure [Collector].[spServiceAcctInsert]

@ExecutionDateTime datetime,

@InstanceID int,

@ServiceName varchar(50),

@StartName varchar(50)

 AS

 Insert dbo.ServiceAcct

Select

@ExecutionDateTime,

@InstanceID,

@ServiceName,

@StartName  

And now here’s the table that you’re logging to:

CREATE TABLE [dbo].[ServiceAcct](

      [ID] [int] IDENTITY(1,1) NOT NULL,

      [ExecutionDateTime] [datetime] NULL,

      [InstanceID] [int] NULL,

      [ServiceName] [varchar](50) NULL,

      [StartName] [varchar](50) NULL

And that’s all there is to it. From here, now that you’ve got all of your service accts and the AD accts associated with them, you can easily query this table to see the impact of changing a password or just see how bad your security is. This is useful info to have. You don’t have to run this kind of thing often though. In fact, it’s not a query that I have scheduled. I typically just run it when I want to know something specific, but you may have a reason to schedule it.

OK guys, that’s all I’ve got on that.  Enjoy powershelling.

Today’s Powershell Fun

I’m having big fun with PS today because I’m installing several new R2 boxes on Windows Server R2.  Where does PS come into play you ask?  Well, good question… I only wish I had a good answer.  Ok, I do.

In order to install SQL Server R2 on Windows 2008 R2 you have to first install the .NET Framework.  However, the SQL installer doesn’t do it for you.  It just tells you that you have to install it.  So the normal way is to go into the server manager and then to ‘roles’ and then ‘add a role’.  Once you’re in the add role wizard choose ‘Application Server’.  Here are some brief screens of the process.

As you can see I’ve already installed it on this server, but it’s pretty easy to figure out from here.  Just install the .NET Framework (it’ll actually already be checked for you) and you’re golden.  Now, that takes too much time for me, especially when I’m installing several boxes at once.  So what I like to do is call on powershell for this task.  I’m going to take some extra time here to show you a couple things, but the process once you get it down is much faster than the wizard.

First of all, you have to load the windows modules into your PS session.  To do it manually, you can open PS and type the following command:

> Import-Module ServerManager

Here’s a screen:

However, now you’ll be able to run the next command.

> Get-WindowsFeature

This will show you all the features available and which ones are already installed.  Here’s a screen of the output.

And of course, you’ll notice on mine, it’s already installed, but it’ll be unchecked on yours.  OK, now you’re ready to do the install itself.

And for that, you need yet another command, but don’t worry, it’s easy. 

> Add-WindowsFeature AS-NET-Framework

Notice I put an arror by the name of the feature above that you’ll feed to the Add-WindowsFeature cmdlet.  So in case you didn’t catch it, the syntax is simply the cmdlet and the name of the feature you want to install. 

Here’s a screen of the line as it’ll look when you type it.

And here’s what it looks like while it’s installing:

And that’s it guys.  The install is actually much quicker from the cmdline and you don’t have to worry about making a mistake through the GUI.  And of course you don’t have to go through all these steps every time.  Why not write a short script to do it for you?  It would look like this:

Import-Module ServerManager
Add-WindowsFeature AS-NET-Framework

That’s it.  That’s all it takes to install it from a script.  Now you can save that off to the network somewhere and just use it whenever you like. 

OK, that’s all I’ve got.

Oh y, one more thing… you don’t have to import the servermanager module every time you run this.  You can use the PS shell that loads all the windows modules for you.  You can find it here:

Ok, THAT’S all I’ve got.

Read my mind

Man, talk about your all time screw-ups!!

OK, here’s what happened.  I got a ticket the other day to truncate a table in a prod DB.  Simple enough, right?  So I wrote the guy and said, are you sure you want me to truncate this table?  He said, yeah, it’s taking up over half of our space.  I said ok, just give me a couple mins.  So I connected and truncated the table.

Well, then I get a call like 10mins later and the guy was frantic.  What happened to my data?  Well, you told me to truncate the table.  NOT ALL OF IT!!! So this is a valuable lesson isn’t it?  Always be sure what you’re asking for and never assume.  I of course didn’t assume anything.  We get tickets to truncate entire DBs even all the time so it’s not that big of a deal for us.  Anyway, just be aware that many users have a completely different database vocabulary than you do.

Is this industry fun or what?

Powershell to recycle error log

Here’s a really good one. I noticed that very few of my boxes had a job to recycle the sql error logs. This is important because unless specifically done, the error log will only recycle when the service is burped in some way. For those of you who aren’t up on the lingo, recycling the error log is when you close down one log and create a new one. The error logs are just txt files kept on the server. So recycling creates a new physical txt file. And the reason you care about this is because some servers don’t get bounced very often at all and the files can get quite large. I have one server with a 40GB error log. And when I try to open it, it sits there for like 2hrs and then finally dies cause there aren’t enough system resources to deal with a file that big. So you can see why you want to keep the size of the files down.

And like I said, the error log only recycles when you burp the service or do it manually. And the way you do it manually is by running this code:

master.dbo.sp_cycle_errorlog

But doing it manually isn’t good enough because it has to be done on a regular basis. I prefer to recycle my logs every day. It’s a good generic schedule and honestly I don’t ever remember needing anything more granular. OK, so a new log every day is what we’re after. That means we need to put this code in a job and have it run every day at the same time.

So ok, that set the stage for what we’re wanting to do here. Now we can get into what we actually wanna talk about… and that’s how we push our new job to all of our boxes. Now, I’m going to do this a specific way and I’m doing it like this for a reason. I’ll explain once I’m done.

Here are the steps I want you to take to prep for this venture.

1. Create a job to occur every day (Midnight is preferable) and include the above code in the job step.
2. Be sure to set the job owner to sa.
3. Script the job and save it as a file somewhere.
4. You’ll have to script the job for different versions of SQL. SQL2K doesn’t have job schedules so the script blows up. So at least have one for SQL2k and below, and one for 2K5 and above.

 Now here’s the powershell script you’ll run.

$SourceSQLScript1 = “\\Server01\F$\SQLServerDBA\Scripts\Collector\SQLQueries\ServerListAllSQL2000.txt”;

$SqlCmd1 = Invoke-Sqlcmd -ServerInstance $Server -Database $StatsDB -inputfile $SourceSQLScript1

$SqlCmd1 | % {

[System.String]$ServerName = $_.ServerName;

$ServerName;

invoke-sqlcmd -ServerInstance $ServerName -Database “MSDB” -InputFile “\\Server01\F$\SQLServerDBA\Scripts\ServerConfig\RecycleSQLErrorLogJOB-SQL2K.sql” -SuppressProviderContextWarning

}

Now I’m going to explain what’s going on here and why:

1. $SourceSQLScript1 — This is the var that holds the location of the txt file where I store the query for the list of servers to run against. Inside the file looks like this: select ServerID as InstanceID, ServerName from dbo.Servers (nolock) where IsSQL = 1 and SQLVersion = 2000. The reason I do it this way is because if I ever need to change the query, all of the scripts that call the query will automatically pick up the change. So putting the query in a file is a good thing.

2. $SqlCmd1 — here’s where I’m actually running the query in the step above and setting the results to this var. Now I’ve got an array of ServerNames I can cycle through to run my script on.

3. [System.String]$ServerName = $_.ServerName; — Here I’m just setting the current cursor iteration of the ServerName equal to a var ($ServerName). It just makes it easier to read and if something comes up where I need to make a minor adjustment to how the ServerName looks, then I can mess with the var and not have to touch the rest of the script. Again, reducing effects to the script if I need to make changes to the value. So it’s like later on down the line if I discover that I need to put [] around it or something like that. This could also be done in the query file, but it’s nice to have the option here as well should I need something special for a single script.

4. $ServerName; — Here I’m just printing out the ServerName so I can watch my script cycle through the boxes. It’s my way of knowing what’s going on. I hate that blinking cursor.

5. invoke-sqlcmd — This is where the real magic happens. I’m connecting to each box and running the job script I saved above.

Here are some finer points of the logic for why I did things the way I did:

1. I put the server list query into a var. This is to allow you (or me) to get the server list any way we like w/o disturbing the rest of the query. You could have a hardcoded list of serverNames, or have them in a file… whatever you like. Here’s a really good example.

Instead of this:
$SqlCmd1 = Invoke-Sqlcmd -ServerInstance $Server -Database $StatsDB -inputfile $SourceSQLScript1
You can have this:
$SqlCmd1 = “Server1”, “Server2”, “Server3”, “Server4”, “Server5”

 And the rest of the script stays the same because the cursor is run off of the var. It doesn’t care how that var gets populated.

 2. I chose to run this separately for each edition of SQL. So I ran it once for SQL2K, once for SQL2K5, etc. That’s because like I said there are differences in the job scripts for the different versions. I could have done it all in a single script, but I really don’t like coding that logic in there and it increases the size of the script because now I have to check for the version of SQL and make a decision before I can create the job. I have to point it to the right .sql script. That’s too much work when all I have to do is change the query file and the script file between runs. And this way, I can run them concurrently for each version simply by saving the script under different names and kicking them off at the same time.

 3. This one is my favorite because it’s what makes this so flexible. I chose to put the sql logic itself into a file and just call that file. The reason I did this is because not only is the sql logic too long for this type of script, but if I’ve got anything else I need to run against all my boxes, all I have to do is change the sql script I’m calling. I don’t have to make any changes at all to the rest of the script. So if I want to create or delete any specific users off of a group of boxes, all I have to do is put that code in a script and save it, and then point this script to it. Or anything else, right? This is really the bread and butter for powershell and multi-server mgmt.

So guys, take this boiler plate script and methodology and apply it to your own stuff. I’ve taken a lot of stuff out of the script for this blog because my scripts are part of a much larger solution I’ve developed and it’s not relevant to what you’ll be doing. But the script works as-is and you can add whatever you want to it. And now you’ve got a generic way to push T-SQL out to your boxes, and a generic way to populate the server list it goes against. What else do you need really?

The hidden update

I had a cool situation today that didn’t come to me right away so I thought I’d share it with all of you.  We’re going to be talking about the lovely deadlock today.  I can’t remember if I’ve ever talked about it before, but I am now so there…

OK, here’s the situation.  I was called over by our very sexy web team lead (George), who told me that they were having deadlock issues on the web portal DB.  Fine, I’ll put a server-side trace on it and see what comes up.  For this I used the Locks\DeadlockGraph.  Once I got my info back, I noticed that we had table1 and table 2 and table3 in the mix. Table1 was a delete against itself.  Then another session ran an update against Table2 joined to table3.  The problem is that the update was deadlocking with the delete and the delete was losing every time.  And also, why was the deadlock on table1?  The update doesn’t even touch table1. 

For starters, all the tables have an update trigger that pulls the inputbuffer and session info for the spid that ran the update.  It then puts this info in a log table.  I don’t know why.  Unfortunately that wasn’t the problem.  I checked and none of the tables turned out to be views either so that avenue was dead.  The problem was just a tiny bit buried, but I eventually found it.  There was another table in the mix… table4.  Table3 had an update cascade set on its FK to table4 and table4 had an FK back to table1.  AH-HA… there’s your connection.  Now, as well, there’s wasn’t an index on the FK col in table1, so it was doing a scan.  Nice huh? 

So my recommended fix was as follows:

1.  Get rid of the auditing update triggers.  If you really want to log the action then put that code in an SP and call it after your update is done, but not as part of the main transaction.  Yes, I’m aware of the very minute risks in this, but they’re far out-weighed by completing your transaction so much faster.

2.  Put an index on the table1 FK column.  This is probably going to give you the biggest bang for your buck.  If it’s not doing a table scan, then it’ll get in and out faster so there’ll be less chance of deadlocking with the delete.  I believe the delete is also searching on the same col so it would really be worthwhile.

3.  Use updlock on the update query. 

My whole plan here is to get these transactions as short as possible.  It’s not enough to have efficient queries because if you’ve got a bunch of other stuff in the transaction then you might as well be doing a table scan every time.  And I know that reading table1/table2/table3 and all that isn’t easy to follow, but hey, we’ve gotta scrub these things for the internet, right?  Really the whole point is that you have to dig sometimes to find the source of something.  I knew the deadlock was on that other table and I could see the index scan in the execution plan, but that table wasn’t listed in the query or in the trigger.  So it had to be coming from somewhere.  So the basic point of this is to remind you of some of the things that are possible so you can remember to check these types of things too.

A Nice SQL Virus

I’m being forced off of my powershell kick here for a few mins to talk about viruses.

Why do companies refuse to allow scanning exceptions for DB files?  Do you really think that anyone’s written a SQL virus?  I really can’t remember ever having see an actual virus that attacks SQL files.  Actually, I take that back.  There is one virus that attacks DB files and drags performance so far down you can barely use the system.  In fact in my performance realm I actually classify this virus as an official DOS attack against my DB.  It’s called virus scan.  Virus scans are the only thing I’ve seen that specifically target DB servers to bring down the performance.

So if any of you non-SQL admins out there are running AV on your servers w/o exception policies for your DB files, turn yourselves in because you’re officially launching a DOS attack against your own servers.

But I promise I’m not bitter.

Instead of working, I blog.