Category Archives: Admin

Avoiding xp_cmdshell

This is gonna be a quick post, but I wanted to throw out a couple easy ways to avoid using xp_cmdshell.

Unfortunately this xp has gotten a bad rep, but only because it’s been abused.  If you use it properly there’s no reason why you can’t continue to get some good use out of it for many years to come.  Actually, I use it for my backup routines for a couple reasons… first, when I wrote them there really wasn’t a better choice.  And second, while I can architect a solution around it quite easily, it’s ridiculous to go so far out of my way to code around something that isn’t a huge problem.  The solution would be more complicated than the original problem, so why bother?  Oh yeah, and since I’m in a mixed shop there’s also that problem with having something like powershell on every box and I don’t wanna have to have a different routine for my SQL2K boxes, than I do for all the rest.  So using xp_cmdshell is still a good fit for me. 

Anyay though, there are some good reasons to avoid it when it’s not necessary (part of using it responsibly).  So we’re going to talk about 2 XPs today that will help you do just that.

First, there’s xp_FixedDrives.  This will report all the drives on your server and the amount of free space in MB.  Now, this can’t replace a well-formed PS script that can give you a lot more info, but it’s a nice start when you just need to look at drive space real quick.

Second, we have xp_create_subdir.  This handy little xp will let you create a directory.  You just pass it the path you want to create and you’re golden.  So it would look something like this:

xp_create_subdir ‘C:\Mydir\Anotherdir\theFinalDir’

And just like in DOS it’ll build the whole path for you so none of those dirs have to exist ahead of time.

Anyway, hope this finds someone well.

Get DB sums with Powershell

Here’s a cute little piece of code to add up the sizes of all your DBs on a server.  It comes in handy when you’re trying to see how much space you’ll need to duplicate the server either for DR, or reporting, or even just to see how much backup space you’ll need.

Make sure you’re at the DB context of PS:

>SQLSERVER:\sql\ServerName\default\databases

Then type this code:

>dir | %{$Total += $_.size}

>$Total

Ok, that’s it.  Not too bad, huh?  Now, let’s work on making it look like something you can actually work with. 

Ordinarily you would prettify this by just dividing it by the measure you want to convert it to.  So if you wanted to convert it to GB, then that would look like this:

>$Total/1GB

However, you can’t do that here.  Here’s what you need to know.

This output is in MB, and the “/1GB” trick assumes that the output is in bytes.  So lets assume that your $Total is 4189811.4375.  Remember now, that’s in MB.  So here’s what the 1GB trick will yield.

>$Total/1GB

0.00390206597512588

That’s not even close to right.  And again, that’s because the 1GB trick assumes your output will be in bytes.  So here the best way to convert this to GB is to do the math yourself like this:

>$Total/1024

4091.61273193359

Of course, you can still use one of the tricks to get the number you’re looking for, but since you’re already in MB you have to go 1 measure down to make it come out right. 

>$Total/1KB

4091.61273193359

So why is that?  Well, because 1KB is only a shortcut for 1024, and from MB, the conversion to GB is /1024.  And the problem of course with doing it this way is that it’s deceiving as hell.  Anyone else looking at this will think you’ve converted to KB because they may not realize that the original output is in MB.  So don’t use this one, I only threw it in there to show you how to come up with the right number.  It also shows you something about how the trick works.

Now, you can still use the 1GB trick, you just have to massage the data first.  So here we’re going to put it back into bytes and then convert it to GB.  This is really just to prove to you that the output for this trick really is assumed to be in bytes.

>($Total*1024*1024)/1GB

4091.61273193359

Personally I’d never do it this way.  Sure, it shows you and someone else what’s going on so you can read the code, but that’s also what comments are for.

>$Total/1024 # Convert to GB.  Can’t use /1GB because this output is in MB and the /1GB assumes the output to be in bytes.

See, what’s wrong with comments?

New Language Services

The new version of SSMS takes advantage of the new language services that the team has been working on for so long now.  So far it’s actually looking pretty good.  You can finally refactor columns throughout all of your code because the language service comes directly from the engine it knows everything about the code you’re looking at.  It’s more of a dev feature, but plenty of DBAs will find this useful as we all wind up doing dev anyway. 

There’ll be more on that in the future, but I just wanted to write this blurb to let everybody know that the language service is coming along.  This is the first real step in bringing us a much better intellisense.  Does that mean you’ll no longer need Red-Gate’s tool?  No, because SSMS still won’t format your code for you or do some of the other cool things SQLPrompt will do.  Things like * expansion, and uniform table aliases are just 2 of the things Red-Gate brings to the table.  Unfortunately for them though, the new SSMS will start to close the gap so they’ll have to come up with something else if they wanna stay in that game.

FileTable is here

FileTable is another new feature in Denali.  FileTable is the next generation of FileStream.  What this allows you to do is copy files into specified folders on your server, and they automatically get stored in SQL.  So you define a FileTable in SQL and point it at a folder.  Now, anything you put in that folder, shows up in the DB itself.  You can still view the files directly from the file system, only when you do, you’re actually reading them out of the DB.  So we’re really starting to obscure the line between DB and Windows. 

This is one of those features that’s really cool.  To be able to drag a group of files into a folder and have them automatically added to the DB is really cool.  Not only can you add/delete them, but you can also modify them.  You can set file properties using t-sql and it’ll persist that change to the file system.  So now I have to wonder what the future of powershell will be because working with a large amount of files is something we do with powershell the most.  It’s one of the things powershell does really well.  But now that we’ve got this, if we’re able to define a folder as a DB table, then we don’t have to use powershell for these types of file ops anymore… we can just use set-based t-sql. 

The possibilities are churning in my head.

Always On

I’m here at the keynote for day 2 of PASS and we’re watching a demo of their new Always On feature for Denali.  And it all looks well and good, but I can’t help but wonder what’s wrong with it.  Always On gives you the ability to define HA groups and in just a few clicks you can configure automatic failover, and synch to multiple servers.  Again, that all looks fine up on stage, but I, like many others, am witholding any opinion until I get in there myself.  I’m sure we all remember the dacpac fiasco.  It looked wonderful on stage and even I was very excited about it.  But when we all went to play with it we found out very quickly that it had more problems than solutions.

Trust me, I’ll be writing plenty on this in the future.

Trace flag 1807

This is something I haven’t had a use for in quite some time but it actually came in handy today.  Of course, this isn’t something you’d want to make a living doing, but you’ll love to keep it in your hat and pull it out every couple yrs or so when the situation arises.

This is all about solving problems.  Here’s the scenario.  A vendor is here and he’s trying to upgrade a DB schema.  The way the thing works is it basically copies the entire DB to new tables and then drops the old ones.  So at the apex of the operation the DB is about double in size.  The problem is that the server doesn’t have enough space to hold double the DB.  So what do you do?

Well, one of your options is to add space to the disk.  The problem there is that you can never just add space, right?  You have to provision it, etc.  So in this scenario this isn’t a good option at all.

Next you could find another SQL box on the same build and move the DBs over there.  You could then perform the upgrade on the other server, and then move the files back when you’re done.  Of course, the problem there is finding another box that doesn’t encroach on someone else’s production.  So you may or may not have this type of box lying around.

You could also think about just moving the files over to like a SAN or another server that has enough space and just attaching them across the wire.  So the way that would look would be for you to move the files to the other server (assume it doesn’t have SQL installed because you don’t need it), then attach the DB from the original server and point it back to the UNC path where the files are now.  Then once the conversion is done, all you have to do is move the files back to the original server and you’re golden.

But wait… there’s a problem.  If you attach files across the network, SQL will run a check that they’re not local and throw an error.  And even if map a drive you’ll get the same result.  So how can we get around SQL checking for network locations?  That’s where trace flag 1807 comes into play.  Just typing a simple DBCC TraceON (1807) and then attaching your files will disable the network check and allow you to attach your files.

Pretty cool huh?

And for those of you who like to see things in action, I’ve got this video of how this all works.

http://midnightdba.itbookworm.com/VidPages/Trace1807/Trace1807.aspx

Even MVPs make mistakes

We’re in the middle of our last mock go-live before this weekend’s prod change-over.  We’re using an SRDF process to move the DB files from the current prod to the new prod.  This is our big SQL R2 cluster and what usually happens is that they present the drives to the cluster and then I run my script to bring the DBs back online.

Well what happened this time is that the drives weren’t added as SQL dependencies and therefore SQL couldn’t see them in order to bring the drives back online.  Well, I didn’t think to check that.  Instead what I did was I just deleted the DBs thinking that just starting from scratch would be the best way to go.  What ended up happening though is that SQL deleted the newly presented data files.  So the moral of this story is that even though SQL can’t see the drives to be able to attach the files, it can apparently see them well enough to delete them behind your back.  Thanks clustering team!

And so this isn’t entirely just a complaining session, here’s the query you can use to see which drives your clustered instance of SQL can see.

SELECT * FROM sys.dm_io_cluster_shared_drives

Now I’ve you’ve got drives in the cluster that SQL can’t see, all you have to do is add them as a dependency to the SQL service and you’ll be fine.  And in Windows 2008 you can do that while the service is online, but in 2003 and below you have to take SQL offline to add them.

Oh, and just for completion, you can tell which nodes are in your cluster by using this:

SELECT * FROM sys.dm_os_cluster_nodes

Here’s a quick update:  The errors I caused and fixed this morning had some residual problems.  Because once you make a mistake and try to fix it, you’ve already got mistakes on your mind and you’re not thinking clearly.  This is why I’m always advocating having restore scripts ready to go in a manner that you don’t have to think about it in a crisis.  You always want things to be as easy as possible.  And this is also why I’m always saying that you should have standardized troubleshooting scripts and everyone should know how to use them.  You don’t want individual DBAs inventing new methods on the fly.  You should all know what scripts are being run to troubleshoot if at all possible.

What does a bad query look like?

In my SQL Marklar blog today I discussed troubleshooting DB processes.  And I’m not going to re-hash all of it here but I did want to tell you about a use case that describes perfectly what I was talking about.

Not so long ago I got a call from one team and they told me that they had some server issues.  Everything moving slow they said.  Ok, so I got on and took a look and nothing was really jumping out at me.  Then I put a profiler trace on it to see if anything jumped out at me.  And of course, I knew nothing about the app or the processes so I really didn’t know what I was looking for, but you’ve gotta start somewhere huh?

So there I am in profiler and I’m just looking for long-running queries.  The problem is there were lots of queries I would consider long-running.  For some reason I focused in on a single SP that was taking like 5mins.  I pulled up the text of the SP and started looking through it.  It all seemed fairly standard.  I mean, it was long and everything wasn’t perfect, but there was nothing out of the ordinary. 

I contacted the app guy again and asked about it.  Does this SP typically take this long to run?  No, he says (and those of you who have seen My Cousin Vinny know where this is going).  So I thought eureka, I actually found something that may fix the issue.  So I got a couple valid params from him and ran the SP with them.  I also made sure to turn on execution plans and statistics io.  The query plan had some dings in it that experience has told me could easily have caused this kind of spike in resource usage.  The problem is that there was no fragmentation, and stats were up to date.  And in talking with the app guy he told me that they just archived a bunch of the data so it was down to like 200mill rows now.  So why would this thing be taking so long to return?  Moving on.

I found a copy of his QA system that had been copied over from prod the previous week and he assured me that they had changed nothing.  I could see the extra rows in the tables (copied before the archival), and the indexes were the same as in prod so that wasn’t the issue.  They had the same fill factor, everything.  In fact, everything I checked was identical except for the amount of data.  So why would having less data cause such a huge performance issue?  Moving on.

I decided that running this thing again and again on prod was probably a bad idea.  I’m just adding to the issue.  So I started doing the rest of my work on his QA box where I was the only spid.  And the hardware was similar as well (I love it when it works out that way).  So I ran the SP on this box and 5mins passed.  Then 10mins.  Then 15mins.  Then 20mins.  And sometime soon after that, the query returned.  I had collected all my stats along the way so I was golden.  It was getting the same execution plan as the prod version.  The results aren’t what I expected at all.  Why is the prod version now performing well in comparison?  So I called the app guy again and explained the situation.  Here’s more or less how the conversation went:

Me:  You know, I just don’t know what’s going on here.  I’ve been looking at this for a long time now and I’m getting further into a hole.  The prod version of this SP takes 5mins, and that’s even after the archival.  But when I run it on QA with all the data there it takes even longer.  If the prod query is acting up then I would expect the QA query to be a shorter time even with the extra data.

Guy:  Yeah that sounds about right to me.

Me:  What sounds right to you?  (I just got a bad feeling that something horrible had gone wrong)  (You know how you can instantly drop all the pieces into place and remember key words that make everything all of a sudden fit together?  Well, I just got that, but I wanted to hear him say it.)

Guy:  This SP usually takes about that much time, but since the archival it went down to 5mins.  We’ve been very pleased.

Me:  So you mean to tell me that when I came to you with this you didn’t find it necessary to tell me that the 5mins was an improvement?

Guy:  Well, I don’t know anything about DBs so I figured you could see that kinda thing already.

Me:  I see.  Well that clears up that mystery.  Now I’ve gotta go back and start over with this whole process.

Guy:  Well I can tell you the one that’s probably causing the issue.

Me:  Oh yeah?  How’s that?

Guy:  Because the slowness is only in our billing section and that’s controlled by just a few queries.  I can give you the names of the SPs and you can look at those.  There are only like 5 of them and since we’re having a hard time pulling up a list of clients it’s likely going to be the one that controls the search on that.

Me:  I see.  So you were just never going to tell me that?  I’ve been messing with this for 2hrs and you could have given me all this info before and saved me tons of time.

Guy:  Well, again, I don’t know anything about DBs and I figured you could see all that.

Me:  You thought I could see the web app from the DB?

Guy:  You can’t?

Me:  Kill me.

So ok, it turned out to be one of the 5 he gave me.  It had a bad query plan.  I wasn’t able to determine that all on my own, btw.  I had to recompile each one of them until I found the bad one.  And that’s because I didn’t have a perf baseline like I discussed on Marklar.

So there are a couple lessons to learn here but I think you can gleen them for yourself.  The whole point of this though is that making assumptions about processes is bad and no matter what looks like a long-running query to you, it may in fact be performing better than usual.

Post-install FUN!

This’ll be a quick one, but I just wanted to throw you guys another useful script real quick.  Now, whether or not you decide to make this work in PS or not is up to you.  Most of my post-install stuff is  still in T-SQL so that’s how I’m giving this to you.  Anyway, it’s something that gets overlooked quite often so I thought I’d just throw it up here real quick.

There are 2 things I like to do right away when I install SQL.  One is to set the job history max rows and the max rows per job, and the other is to set the number of error logs SQL will keep before deleting them. 

So this first one configs the job logging.  I like to keep a lot of logs for my jobs.  I’ve found that more often than not the default setting isn’t adequate if you want a real picture of what’s happening with your jobs.  So I increase it right off the bat.

 USE [msdb]
GO
EXEC msdb.dbo.sp_set_sqlagent_properties @jobhistory_max_rows=100000,
      
@jobhistory_max_rows_per_job=10000
GO

 

This 2nd script is what I use to set how many of the logs get stored.  Of course, you could also use a simple PS script to save the logs off to a different folder so you have them on hand if you like, but that’s up to you.

USE [master]
GO
EXEC xp_instance_regwrite N'HKEY_LOCAL_MACHINE', N'Software\Microsoft\MSSQLServer\MSSQLServer', N'NumErrorLogs', REG_DWORD, 30
GO

The thing I really like about both of these scripts is that they work on SQL2K and up.  I’ve tested these all the way up to R2 and they’re both just fine.  So if you don’t have some standardized setup scripts you run on all your boxes when you install them, then let these be your first.  The more things are the same in your shop the easier it’ll be to admin things.  You’ll know what’s happening on your boxes and you have fewer things to remember.  And scripting becomes much easier too because if things run the same way and are named the same thing, then you know exactly what to look for in your scripts. 

Have fun.

The mini heart attack

Tell me if this scenario isn’t all too familiar.  You get a call in the middle of the night saying that something’s wrong with a SQL box.  It won’t respond at all.  Now, you’re the first one on the scene because we all know that if there’s a SQL box anywhere within 100 mils of the app, they’ll call the DBA first because it has to be the DB.  So anyway, you have plenty of time to think while you’re on the way to the scene. 

That first step is the hardest.  You throw the covers off of you and expose your warm body to the cold night air. 

You swing your feet on the floor and try to find something that’s not gonna be too hot or too cold because you have no idea how long you’ll be up.  At this point you’re still grumbling about how wordy the helpdesk guy was when he called.  Why can’t these guys ever realize that we’re asleep and all we need is essential info?

Now you’re on your way down the hall and your thoughts turn to the problem at hand.  What could it be?  Did someone turn off the service?  Did CHECKDB take too long and is blocking everyone? 

You just hit your desk and start logging in.  While you’re waiting for desktop to come up it hits you… what are the state of the backups?  Am I sure that I can recover this box should something happen?  What if this server’s down for good?  Oh CRAP, is this the box I got the alerts about the failed log backups yesterday and just ignored them?  I’ll be lucky if the log just filled up and I can do something about it, but if the box is down and I lost all at that data because I didn’t feel like messing with it I’m in trouble. 

So you login and take a look at the server.  You’re able to TS in without any trouble and you breathe a small sigh of relief.  Well, at least the box itself isn’t down.  I can fix almost anything else that may be wrong.  You instantly look at your backup jobs to see if something has failed.  You see nothing out of the ordinary.

Next you look to see if anything’s blocking.  Nope, you see nothing of the kind.

You can query the DB and it’s online so now your heart attach is completely gone and you turn your thoughts to what could actually be wront.  So you call the person who submitted the ticket to begin with.  She reports that they couldn’t get in to the DB for a few mins but that all seems fine now.  You tie up some loose ends on the phone and end the call.

You then close the lid on your box and as you walk back to bed you make a promise to yourself that you’ll never ignore another log backup alert ever again. 

Don’t let this happen to you.  Laziness is no excuse for not having a backup and answering the alert is much easier than answering your boss’s call about why you lost half a day of data.  Don’t you have alerts setup?  Then why did this happen?  That’s a tough question to answer. 

The point of all this is that things can go wrong on a moment’s notice.  Restoring your DB is like a car wreck.  It can happen at any time and you won’t have any notice.  So if you’re not sure… and I mean SURE about the state of your backups, then get that way.  Do whatever it takes to make sure you’re covered and you know it.  And that includes doing the actual restore.  Do you know exactly what it’ll take to restore your server?  The last thing you want to do is to have to figure out what has to be done when you’re under the gun.  So unless you know exactly what it’ll take to restore something then you don’t have a solid plan.  So avoid the bread lines and these mini heart attacks and do your diligence.  The life you save may be your own.