Tag Archives: DBA

The mini heart attack

Tell me if this scenario isn’t all too familiar.  You get a call in the middle of the night saying that something’s wrong with a SQL box.  It won’t respond at all.  Now, you’re the first one on the scene because we all know that if there’s a SQL box anywhere within 100 mils of the app, they’ll call the DBA first because it has to be the DB.  So anyway, you have plenty of time to think while you’re on the way to the scene. 

That first step is the hardest.  You throw the covers off of you and expose your warm body to the cold night air. 

You swing your feet on the floor and try to find something that’s not gonna be too hot or too cold because you have no idea how long you’ll be up.  At this point you’re still grumbling about how wordy the helpdesk guy was when he called.  Why can’t these guys ever realize that we’re asleep and all we need is essential info?

Now you’re on your way down the hall and your thoughts turn to the problem at hand.  What could it be?  Did someone turn off the service?  Did CHECKDB take too long and is blocking everyone? 

You just hit your desk and start logging in.  While you’re waiting for desktop to come up it hits you… what are the state of the backups?  Am I sure that I can recover this box should something happen?  What if this server’s down for good?  Oh CRAP, is this the box I got the alerts about the failed log backups yesterday and just ignored them?  I’ll be lucky if the log just filled up and I can do something about it, but if the box is down and I lost all at that data because I didn’t feel like messing with it I’m in trouble. 

So you login and take a look at the server.  You’re able to TS in without any trouble and you breathe a small sigh of relief.  Well, at least the box itself isn’t down.  I can fix almost anything else that may be wrong.  You instantly look at your backup jobs to see if something has failed.  You see nothing out of the ordinary.

Next you look to see if anything’s blocking.  Nope, you see nothing of the kind.

You can query the DB and it’s online so now your heart attach is completely gone and you turn your thoughts to what could actually be wront.  So you call the person who submitted the ticket to begin with.  She reports that they couldn’t get in to the DB for a few mins but that all seems fine now.  You tie up some loose ends on the phone and end the call.

You then close the lid on your box and as you walk back to bed you make a promise to yourself that you’ll never ignore another log backup alert ever again. 

Don’t let this happen to you.  Laziness is no excuse for not having a backup and answering the alert is much easier than answering your boss’s call about why you lost half a day of data.  Don’t you have alerts setup?  Then why did this happen?  That’s a tough question to answer. 

The point of all this is that things can go wrong on a moment’s notice.  Restoring your DB is like a car wreck.  It can happen at any time and you won’t have any notice.  So if you’re not sure… and I mean SURE about the state of your backups, then get that way.  Do whatever it takes to make sure you’re covered and you know it.  And that includes doing the actual restore.  Do you know exactly what it’ll take to restore your server?  The last thing you want to do is to have to figure out what has to be done when you’re under the gun.  So unless you know exactly what it’ll take to restore something then you don’t have a solid plan.  So avoid the bread lines and these mini heart attacks and do your diligence.  The life you save may be your own.

The stupid have fallen

This is a followup to my previous post here.

Wow, what a day.  Ok, so I just got off the phone with the tech lead for that company.  It was enlightening to say the least.  It was really a longer call than I expected or that I even wanted, but my blood pressure’s gone down a bit now.  I obviously can’t recount the entire conversation for you, but here are the highlights:

V = Vendor

M = Me

V: Hi Sean, this is Josh(not his real name) with X and I was told you wanted to talk to me.

M: No, the last thing in the world I wanna do is talk to you, but I feel in order to get our issue resolved I have no choice.

V: Yeah, Cornhole(not his real name) told me you got upset with him when he was just trying to explain to you what your issue was and then you insisted on speaking with me.

M: This is just incredible.  No I didn’t get upset with him for explaining the issue to me.  I got upset with him when I told him his explanation was flatout wrong and then he called me an idiot.  He didn’t even hint at it.  He called me an idiot outright.

V: Really?  Cause that’s not the story I got at all.  I heard that he was calmly explaining that your backups were the issue and you just exploded on him.

M: No, that’s NOT what happened at all.  That little turd actually called me stupid for not buying his assinine explanation for our issue and then I only exploded after he did that a couple times.

V:  Well what don’t you believe about it then?  Because we’ve seen that many times where backups will cause proceses to block.

M: No you haven’t.  Listen, I realize that everyone can’t spend their days studying SQL, but one simple fact is that backups don’t block processes.  Period.  I’ve been doing nothing but DBs in SQL Server for over 15yrs against TBs of data and nowhere have I ever seen a backup block a process.  The problem is that you’ve got like 3 tables with TABLOCK in that sp and that’s what causing your blocking.  Don’t you think it’s funny that it only clears up AFTER you kill that process?

V: Where did you get the code?  Those sps are encrypted.

M: Encryption only stops amateurs.  And while we’re at it, what’s this script I hear about that you’ve got to unconfuse SQL to give you the real spid for the process?

V: Where did you hear that backups don’t block processes?  It’s well-known here that it’s one of our biggest problems.

M: No, your biggest problem is your TABLOCKS.  And I heard it from the truth. 

V: What is a tablock?

M: A table-level lock.  You’re locking the entire table just to read data.

V: I don’t know about that, I’ve never seen the code because it’s encrypted.

M: Well I do and that’s your problem.  Now what about this script I kept hearing about from Cornhole?

M: Did you get my email?

V: Yeah.

M: And you can see the place in the code where you’re locking the whole table?

V: Yeah.

M: So how is it that you lock the entire table and you claim that it’s the backup blocking everything?  Haven’t you noticed that the backup only decides to block stuff when this process is running?

V: That does sound a little suspicious when you put it that way.  What can we do?

M: Let me take those TABLOCKs out of there so I can query these tables without locking everything up.

V: I just saw your email signature and I’ve never heard of an MVP before, but I just looked it up.  Are you really one of those guys?

M: Yeah.

V: Wow, so I guess you really know what you’re doing then.

M: I’ve hung a few DBs in my day, sure.

V: Do you mind if I write you again if I have any questions?

M: I suppose that depends on whether you’ll let me change that stupid sp or not.

V: Yeah, go ahead.  I’ll have to have a talk with our devs tomorrow and try to convince them to put this in our next build.  Honestly, I’m only a tech lead here because I’ve been here a few months longer than anyone else on the support team.  We get all this stuff from the devs and they won’t tell us anything.

M: So what about this script I keep hearing about.  The one that tells you how to unconfuse SQL and give you the right spid for a process?

V: That’s a script the devs have and they won’t show it to me either.  I just tell my guys something else so they won’t know I haven’t seen it.

M: Wow, that sucks man.  You do know though that a script like that doesn’t exist, right?  It’s completely bullshit.

V: Yeah, I hate this place.  I’ve been suspecting for a while now they they were lying to me, but what can I do?  I need this job.  I’m just lucky to be working.  I have no idea what I’m doing here.

M: I don’t know what to say to that.  You’re right, you are lucky to be working these days.  But pick up a couple books on your own and learn SQL.  Don’t rely on them.  There are websites, blogs, etc.  I myself run a site.

So in short, I really started to feel for this guy.  He ended the call apologizing 100x for the grief they’ve caused me and he’s going to go out right away and start watching my vids and trying to learn more.  He asked if he could write me with questions, and I said, of course dude, anytime.  That’s what I do.

And y, he sounded about 22 as well.  Turns out I was right.  I asked and he was 21 and this was only his 2nd job and they brought him in from nothing and “trained” him.  I’d say he’s got some work to do.  But I personally consider this exchange a success.

Why can’t voodoo be real?

I had a very interesting talk with a vendor this morning.  Ok, it wasn’t really interesting as much as maddening.  I swear, it’s guys like this that are going to be the death of me.  I know I shouldn’t let them get to me but sometimes they do.  You know how it goes, you try to explain something and it just escalates until you’re upset.  Anyway…

OK, so they were having blocking issues in the DB.  The vendor was trying to convince the app team to turn off the SQL backups because they were interfering with their processes.  That’s when they called me.  Frankly, I can’t believe they called because the vendor was very adamant and this is the kind of thing our app groups do on their own without consulting us at all, so I’m more shocked than anything.  But I knew this was going to be a bad call the second my app guy told me what was going on.

Here’s the conversation as it took place just a couple hrs ago.  It took me this long to regain my strength.

V = vendor

M = me

M: Ok, I’m here now.  So explain to me what’s going on.

V: Well, we’re having severe blocking issues in the DB and after looking at it we noticed it’s the SQL backups causing it.  So what we need to do is stop the SQL backups.

M: Backups don’t hold locks in the DB so there’s no way for it to be causing blocking.

V: Of course backups hold locks.  Everything holds locks.

M: No, they don’t hold locks.  There’s never been a case where a backup has blocked a process.

V: We see it all the time.  And you’ll just have to trust me because I do this for a living.  I’m very familiar with this app and this is how we always fix this issue.

M: You always fix your poor coding issues by having people not backup their data?

V: You can backup your data, but you have to schedule downtime every week to do it… to keep it from blocking.

M: I really don’t want to finish this conversation, but I’m developing this morbid curiosity about where this is leading.

V: What does that mean?

M: Well, you clearly know nothing about SQL and you’re doing your customers such a huge disservice by making them turn off their backups when all you have to do is fix your query.

V: What do you mean I know nothing about SQL, you’re the one who thinks backups don’t cause blocks when I see that all the time.

M: No, what you see all the time is your process causing blocks and people running backups.  Because I’m on your server now and I can see the blocking you’re referring to and it has nothing to do with the log backup, it has to do with your process on spid 201.

V: I thought you were a DBA.  When there’s severe blocking in a system SQL Server sometimes reports the wrong spid.  Our tech lead assures us that this is coming from the backup.

M: Well, if you’re just going to insult me we can end this call right now because I really don’t have time for this.  I can’t teach every moron who writes an app for SQL everything they need to know to support it.

V: Now who’s being insulting?

M: Oh, that would still be you.  Because you know nothing about SQL and you hop on the phone and start calling me an idiot because you don’t know how to troubleshoot a simple blocked process.  Since we’ve been on the phone, I’ve looked at the sp that’s causing this and the problem is simple; you’ve got a hardcoded TABLOCK in 2 of your queries.  That’s what causing the blocking.  Have you ever bothered looking at this process to see what it’s doing?

V:  No, because it’s the backup causing it.  There’s nothing wrong with the code.  We run this at several customer sites and nobody ever complains.

M: Ok, let me try something different here… if SQL’s reporting a different spid for the blocking process, how does your tech lead know that this incorrect spid points back to the backup?  Why couldn’t it be pointing to one of the many other processes running right now?

V: He’s got a script that can tie them together.

M: I would love to see that script.  Maybe I could learn something, huh?  But if SQL itself gets confused about the spid and doesn’t report it right in sysprocesses, then how can he tie it to the right process in his script?  Wouldn’t SQL tie it to the right spid and present it to us the right way to begin with?

V: He won’t show anyone the script because he says it’s too advanced and we don’t need to know how to do those types of things. 

M: Wow, that’s convenient isn’t it?  Let me try something different here.  Because now this is like a train wreck I just can’t turn away from.  When you kill the backups at other client sites, does the process clear up right away?

V: No, unfortunately, the database can’t recover from something that’s been blocked by a backup so we have to kill our process to and then things work fine.

M: My, you’ve just built your own little world in there haven’t you?  You’ve got excuses for everything.

V:  They’re not excuses.  It’s how databases work.  Where did they find you anyway?  Have you ever been to any SQL Server classes? Maybe they should send you to some.

M: No, I’ve never been to any SQL classes.  I’ve never needed them.  And if you call me an idiot again you’re gonna find out what else I can do.  In fact, I’m a hair away from calling the CEO of your company and letting him know what kind of stupid, snide little ass he’s got working for him.  So you’d better watch your tone with me boy and remember I”M the customer.  So you don’t tell me what to do and you don’t get on the phone with me and call me an idiot.  And here’s something else for you.  Before I even got on this call I ripped your access from the box because I don’t want you touching anything on our server.  Nobody’s turning off our backups, and nobody’s ever touching our server.  And I’m actually done with you.  I’m not going to troubleshoot this with you one more second.  Here’s what I want.  I want you to escalate this to your team lead and I’ll fight this out with him.  I want a call with him at the next available opportunity.  You are out of this issue.

V: Look, I’m sorry if you thought I was being

S: Shut up.  I’m done with you.  The next call I get had better be from your team lead.  Good-bye.

I’m telling you guys, if voodoo were real I’d be sticking pins in this guy’s ass for the next month.  And the sad thing is I doubt it’s his fault.  He’s got some jackass yanking his chain and this is probably just some kid who they’ve taught to be this way.  He sounded like he was probably in his early 20’s.  I still haven’t heard from his tech lead, btw.

Why won’t the log shrink?

I know that everybody always says you shouldn’t shrink your log files, but lets face it, sometimes it really is necessary.  Maybe you had a big op that was out of the norm and it grew your log bigger than it usually runs, or maybe (and more common) your backups were failing for a few days before you noticed and the log is now huge and you need to get it back down to a more manageable size.  For whatever reason, you need to shrink your log.  Now, I’m gonna show you the T-SQL way because it’s what I’ve got handy and I’ve used this script for years.

So here’s the script and then we’ll discuss a couple things… because the script isn’t the meat of this post.  But this one will do all DBs on the server, so just use the logic if you don’t like the cursor.  And it has no effect on DBs that can’t have their logs shrunk so don’t worry about causing damage.

Declare @curDBName sysname
OK, so that’s fairly standard admin stuff.  Now, If you wanna check that you’ve had the effect you expected, you can run this command both before and after to make sure you’ve shrunk it.

Declare DBName Cursor For

Select Name from sysDatabases

Open DBName

Fetch Next From DBName into @curDBName

while @@Fetch_Status = 0

Begin

DBCC ShrinkDataBase (@curDBName , TRUNCATEONLY)

Fetch Next From DBName into @curDBName

END

Close DBName

DeAllocate DBName

DBCC SQLPERF(LOGSPACE)

And what you’re going to get from that is a result set that gives you the DBName, the log file size, and the % used.  What you want to look at is the log file size and the %Used.  If the size of the file is really big and the %Used is really small, that means your log file is a lot bigger than it needs to be.  So an example would be you’ve got a log file that’s 52000MB and 3% used.  That’s a lot of empty space.  So you’ve definitely got to use the above cmd to shrink it. 

But what happens if you shrink it with that cmd and it doesn’t shrink?  Or maybe it just shrinks a little bit?  Well, now you’ve got to troubleshoot why it’s not shrinking.  This is one of my standard interview questions and you’d be surprised how many people get it wrong.  As a matter of fact, I’ve never had anybody get it right.  So here’s what you do… pick up a book and become a DBA.  Ok, seriously though, it’s extremely easy.

Run this query in master to see what your log may be waiting on to truncate.

SELECT Name, log_reuse_wait_desc FROM sys.databases

What you’ll get here is a result that tells you what the log is waiting for before it can reuse the VLFs, and in this case, kill them so you can shrink the file.  There are any number of things that can show up here as the reason.  You could be waiting on a long transaction, or on a log backup, or on replication, etc.  So next time you have a problem getting a log to shrink, come here first and find out what it’s waiting for before you go guessing what the problem might be.

That’s all I’ve got.

[EDIT]
Ok, I got a comment below that says BOL states that TRUNCATEONLY only works for data files. I checked and he’s right, BOL does say that. However, I’ve been using this method for years and I just ran a test on it individually using the code I pasted out of this post so BOL is wrong on this one. I ran my test on SQL2K8 but I don’t see any reason why it would change significantly with a lower version.

Audit service accounts with Powershell

I thought I’d kick off the week with a useful little PS script I wrote last week.  It happens from time to time where you need to find out what AD accts your SQL boxes are running on.  It could be that you simply want to see how diverse they are or maybe you want to change the password of an acct and you want to see what the impact will be.  All the same, it can be useful to audit the startup accts for your SQL services. 

So here’s a little script you can use to get that done.  This again is part of a much larger process I have, but I’ve pulled it out and simplified it to its base componets for you.  I’ve also added the DB side of the process for completion. 

Here’s the PS piece:
$Server = ‘localhost’
$StatsDB = ‘DBStats’                  
 $SourceSQLScript1 = “\\Server1\F$\SQLServerDBA\Scripts\Collector\SQLQueries\AllSQLBoxes.txt”; 

$SqlCmd1 = Invoke-Sqlcmd -ServerInstance $Server -Database $StatsDB -inputfile $SourceSQLScript1
$SqlCmd1 | %  {
        $_.ServerName; ####Print ServerName
        [System.Int32]$ID = $_.InstanceID;
        [System.String]$ServerName = $_.ServerName;
        $ExecutionDateTime = Get-Date -Format “%M/%d/%y %H:m:ss”; 

$a = Get-wmiobject win32_service -ComputerName $ServerName -filter “DisplayName like ‘%sql%'” -ErrorVariable err -ErrorAction SilentlyContinue 

$a | % { $DisplayName = $_.DisplayName;
   $StartName = $_.StartName; 

 Invoke-Sqlcmd -ServerInstance $Server -Database $StatsDB -Query “Collector.spServiceAcctInsert ‘$ExecutionDateTime’,’$ID’,’$DisplayName’,’$StartName'”
}

Most of this is the same as the other scripts I’ve posted so I’m not really gonna go into every line like I have in the past.  I am going to explain the params for the invoke-sqlcmd line though.  In this line I’m just calling an SP that I’ve put in place to log these details to a table and you can use as much or as little of this as you wish.  The $ID param is the ID of the server.  In my complete solution I have a server table and I log things to other tables using the ServerID that comes from that table.  You’re free however to just capture the ServerName here instead.  But you notice that in the query, I get the ID and the ServerName and I use those in the script.  I’m just presenting you with how I do it and you can change it. 

Here’s the SP that I use in the DB that gets called by this script.  It’s very simple. 

CREATE procedure [Collector].[spServiceAcctInsert]

@ExecutionDateTime datetime,

@InstanceID int,

@ServiceName varchar(50),

@StartName varchar(50)

 AS

 Insert dbo.ServiceAcct

Select

@ExecutionDateTime,

@InstanceID,

@ServiceName,

@StartName  

And now here’s the table that you’re logging to:

CREATE TABLE [dbo].[ServiceAcct](

      [ID] [int] IDENTITY(1,1) NOT NULL,

      [ExecutionDateTime] [datetime] NULL,

      [InstanceID] [int] NULL,

      [ServiceName] [varchar](50) NULL,

      [StartName] [varchar](50) NULL

And that’s all there is to it. From here, now that you’ve got all of your service accts and the AD accts associated with them, you can easily query this table to see the impact of changing a password or just see how bad your security is. This is useful info to have. You don’t have to run this kind of thing often though. In fact, it’s not a query that I have scheduled. I typically just run it when I want to know something specific, but you may have a reason to schedule it.

OK guys, that’s all I’ve got on that.  Enjoy powershelling.

Today’s Powershell Fun

I’m having big fun with PS today because I’m installing several new R2 boxes on Windows Server R2.  Where does PS come into play you ask?  Well, good question… I only wish I had a good answer.  Ok, I do.

In order to install SQL Server R2 on Windows 2008 R2 you have to first install the .NET Framework.  However, the SQL installer doesn’t do it for you.  It just tells you that you have to install it.  So the normal way is to go into the server manager and then to ‘roles’ and then ‘add a role’.  Once you’re in the add role wizard choose ‘Application Server’.  Here are some brief screens of the process.

As you can see I’ve already installed it on this server, but it’s pretty easy to figure out from here.  Just install the .NET Framework (it’ll actually already be checked for you) and you’re golden.  Now, that takes too much time for me, especially when I’m installing several boxes at once.  So what I like to do is call on powershell for this task.  I’m going to take some extra time here to show you a couple things, but the process once you get it down is much faster than the wizard.

First of all, you have to load the windows modules into your PS session.  To do it manually, you can open PS and type the following command:

> Import-Module ServerManager

Here’s a screen:

However, now you’ll be able to run the next command.

> Get-WindowsFeature

This will show you all the features available and which ones are already installed.  Here’s a screen of the output.

And of course, you’ll notice on mine, it’s already installed, but it’ll be unchecked on yours.  OK, now you’re ready to do the install itself.

And for that, you need yet another command, but don’t worry, it’s easy. 

> Add-WindowsFeature AS-NET-Framework

Notice I put an arror by the name of the feature above that you’ll feed to the Add-WindowsFeature cmdlet.  So in case you didn’t catch it, the syntax is simply the cmdlet and the name of the feature you want to install. 

Here’s a screen of the line as it’ll look when you type it.

And here’s what it looks like while it’s installing:

And that’s it guys.  The install is actually much quicker from the cmdline and you don’t have to worry about making a mistake through the GUI.  And of course you don’t have to go through all these steps every time.  Why not write a short script to do it for you?  It would look like this:

Import-Module ServerManager
Add-WindowsFeature AS-NET-Framework

That’s it.  That’s all it takes to install it from a script.  Now you can save that off to the network somewhere and just use it whenever you like. 

OK, that’s all I’ve got.

Oh y, one more thing… you don’t have to import the servermanager module every time you run this.  You can use the PS shell that loads all the windows modules for you.  You can find it here:

Ok, THAT’S all I’ve got.

Powershell to recycle error log

Here’s a really good one. I noticed that very few of my boxes had a job to recycle the sql error logs. This is important because unless specifically done, the error log will only recycle when the service is burped in some way. For those of you who aren’t up on the lingo, recycling the error log is when you close down one log and create a new one. The error logs are just txt files kept on the server. So recycling creates a new physical txt file. And the reason you care about this is because some servers don’t get bounced very often at all and the files can get quite large. I have one server with a 40GB error log. And when I try to open it, it sits there for like 2hrs and then finally dies cause there aren’t enough system resources to deal with a file that big. So you can see why you want to keep the size of the files down.

And like I said, the error log only recycles when you burp the service or do it manually. And the way you do it manually is by running this code:

master.dbo.sp_cycle_errorlog

But doing it manually isn’t good enough because it has to be done on a regular basis. I prefer to recycle my logs every day. It’s a good generic schedule and honestly I don’t ever remember needing anything more granular. OK, so a new log every day is what we’re after. That means we need to put this code in a job and have it run every day at the same time.

So ok, that set the stage for what we’re wanting to do here. Now we can get into what we actually wanna talk about… and that’s how we push our new job to all of our boxes. Now, I’m going to do this a specific way and I’m doing it like this for a reason. I’ll explain once I’m done.

Here are the steps I want you to take to prep for this venture.

1. Create a job to occur every day (Midnight is preferable) and include the above code in the job step.
2. Be sure to set the job owner to sa.
3. Script the job and save it as a file somewhere.
4. You’ll have to script the job for different versions of SQL. SQL2K doesn’t have job schedules so the script blows up. So at least have one for SQL2k and below, and one for 2K5 and above.

 Now here’s the powershell script you’ll run.

$SourceSQLScript1 = “\\Server01\F$\SQLServerDBA\Scripts\Collector\SQLQueries\ServerListAllSQL2000.txt”;

$SqlCmd1 = Invoke-Sqlcmd -ServerInstance $Server -Database $StatsDB -inputfile $SourceSQLScript1

$SqlCmd1 | % {

[System.String]$ServerName = $_.ServerName;

$ServerName;

invoke-sqlcmd -ServerInstance $ServerName -Database “MSDB” -InputFile “\\Server01\F$\SQLServerDBA\Scripts\ServerConfig\RecycleSQLErrorLogJOB-SQL2K.sql” -SuppressProviderContextWarning

}

Now I’m going to explain what’s going on here and why:

1. $SourceSQLScript1 — This is the var that holds the location of the txt file where I store the query for the list of servers to run against. Inside the file looks like this: select ServerID as InstanceID, ServerName from dbo.Servers (nolock) where IsSQL = 1 and SQLVersion = 2000. The reason I do it this way is because if I ever need to change the query, all of the scripts that call the query will automatically pick up the change. So putting the query in a file is a good thing.

2. $SqlCmd1 — here’s where I’m actually running the query in the step above and setting the results to this var. Now I’ve got an array of ServerNames I can cycle through to run my script on.

3. [System.String]$ServerName = $_.ServerName; — Here I’m just setting the current cursor iteration of the ServerName equal to a var ($ServerName). It just makes it easier to read and if something comes up where I need to make a minor adjustment to how the ServerName looks, then I can mess with the var and not have to touch the rest of the script. Again, reducing effects to the script if I need to make changes to the value. So it’s like later on down the line if I discover that I need to put [] around it or something like that. This could also be done in the query file, but it’s nice to have the option here as well should I need something special for a single script.

4. $ServerName; — Here I’m just printing out the ServerName so I can watch my script cycle through the boxes. It’s my way of knowing what’s going on. I hate that blinking cursor.

5. invoke-sqlcmd — This is where the real magic happens. I’m connecting to each box and running the job script I saved above.

Here are some finer points of the logic for why I did things the way I did:

1. I put the server list query into a var. This is to allow you (or me) to get the server list any way we like w/o disturbing the rest of the query. You could have a hardcoded list of serverNames, or have them in a file… whatever you like. Here’s a really good example.

Instead of this:
$SqlCmd1 = Invoke-Sqlcmd -ServerInstance $Server -Database $StatsDB -inputfile $SourceSQLScript1
You can have this:
$SqlCmd1 = “Server1”, “Server2”, “Server3”, “Server4”, “Server5”

 And the rest of the script stays the same because the cursor is run off of the var. It doesn’t care how that var gets populated.

 2. I chose to run this separately for each edition of SQL. So I ran it once for SQL2K, once for SQL2K5, etc. That’s because like I said there are differences in the job scripts for the different versions. I could have done it all in a single script, but I really don’t like coding that logic in there and it increases the size of the script because now I have to check for the version of SQL and make a decision before I can create the job. I have to point it to the right .sql script. That’s too much work when all I have to do is change the query file and the script file between runs. And this way, I can run them concurrently for each version simply by saving the script under different names and kicking them off at the same time.

 3. This one is my favorite because it’s what makes this so flexible. I chose to put the sql logic itself into a file and just call that file. The reason I did this is because not only is the sql logic too long for this type of script, but if I’ve got anything else I need to run against all my boxes, all I have to do is change the sql script I’m calling. I don’t have to make any changes at all to the rest of the script. So if I want to create or delete any specific users off of a group of boxes, all I have to do is put that code in a script and save it, and then point this script to it. Or anything else, right? This is really the bread and butter for powershell and multi-server mgmt.

So guys, take this boiler plate script and methodology and apply it to your own stuff. I’ve taken a lot of stuff out of the script for this blog because my scripts are part of a much larger solution I’ve developed and it’s not relevant to what you’ll be doing. But the script works as-is and you can add whatever you want to it. And now you’ve got a generic way to push T-SQL out to your boxes, and a generic way to populate the server list it goes against. What else do you need really?

How to Monitor SQL Services with Powershell

Here’s the situation…

You get a call from one of your customers saying that the log has filled up on the DB and they can’t do anything any more.  So you connect to the server and find out that the log backups haven’t been running.  So you run the backup and everything is hunkydory.  But why did it fail to run in the first place?  Well about 3secs of investigation tells you that the Agent was turned off.  Ok, you turn it back on and go on about your business.  But this isn’t the way to do things.  You don’t want your customers informing you of important conditions on your DB servers.  And you certainly don’t want Agent to be turned off. 

And while there may be some other ways to monitor whether services are running or not, I’m going to talk about how to do it in PS.  There are 2 ways to do this in PS… get-service and get-wmiobject.  Let’s take a look at each one to see how they compare.

In the old days (about 2yrs ago), when all we had was the antiquated powershell v.1, you had to use get-wmiobject for this task because get-service didn’t allow you to hit remote boxes.  All that’s changed now so you can easily run get-service against a remote box with the -computername parameter.

get-service -computername Server2

And of course it supports a comma-separated list like this:

get-service -computername Server2, Server3

And just for completeness here’s how you would sort it, because by default they’re going to be sorted by DisplayName so services from both boxes will be inter-mingled.

get-service -computername Server2, Server3 | sort -property MachineName | FT MachineName, DisplayName, Status

Ok, that was more than just sorting wasn’t it?  I added a format-table (FT) with the columns I wanted to see.  You have to put the MachineName there so you know which box you’re gong against, right?  And the status is whether it’s running or not.

Remember though that I said we were going to do SQL services, and not all the services.  So we still have to limit the query to give us only SQL services.  This too can be done in 2 ways:

get-service -computername Server2, Server3 -include “*sql*” | sort -property MachineName | FT MachineName, DisplayName, Status

get-service -computername Server2, Server3 | ?{$_.DisplayName -match “sql”} | sort -property MachineName | FT MachineName, DisplayName, Status

so here I’ve used the -include and the where-object(?).  They’ll both give you the same results, only the -include will filter the results on the remote server and the where-object will filter them on the client.  So ultimately the -include will be more efficient because you don’t have to send all that extra text across the wire only to throw it away.

And of course, you don’t have to use that inline list to go against several boxes.  In fact, I don’t even recommend it because it doesn’t scale.  For purposes of this discussion I’ll put the servers in a txt file on C:.  Here’s how you would do the same thing while reading the servers from a txt file, only this time you could very conveniently have as many servers in there as you like.  And when creating the file, just put each server on a new line like this:

Server2
Server3

So here’s the same line above with the txt file instead:

get-content C:\Servers.txt | %{get-service -computername $_ -include “*sql*” | sort -property MachineName | FT MachineName, DisplayName, Status}

This is well documented so I’m not going to explain the foreach(%) to you.

Ok, so let’s move on to the next method because I think I’ve said all I need to say about get-service.  But isn’t this just gold?

get-wmiobject

Earlier I was talking about what we did in the old days and I always used to recommend get-wmiobject because of the server limitation imposed on get-service.  However, does that mean that get-wmiobject is completely interchangable with get-service now?  Unfortunately not.  I’m going to go ahead and cut to the chase here and say that you’ll still wanna use get-wmiobject for this task most of the time… if not all of the time, because why change methods?

You’ll notice one key difference between doing a gm against these 2 methods:

get-service | gm

get-wmiobject win32_service | gm

The get-wmiobject has more methods and more properties.

And the key property we’re interested in here is the StartMode.

If you’re going to monitor for services to see which ones are stopped, it’s a good idea to know if they’re supposed to be stopped.  Or even which ones are set to Manual when they should be set to Automatic.

And for this reason I highly recommend using getwmiobject instead of get-service.

Here’s some sample code using the txt file again.

get-content C:\Servers.txt | %{get-wmiobject win32_service -computernatm $_ -filter “DisplayName like ‘%sql%’ “} | FT SystemName, DisplayName, State, StartMode -auto

Notice that the names of things change between methods too, so watch out for that.  So like MachineName changes to SystemName.  You’ll also notice that I didn’t provide you with a full working example of a complete script.  That’ll be for another time perhaps.  The script I use fits into an entire solution so it’s tough to give you just a single script w/o also giving you all the stuff that goes along with it.  And that just gets out of the scope of a single blog post.

However, I’ll leave you with these parting pieces of advice when building your service monitor.

1.  Instead of pulling the servers from a txt file, put them in a table somewhere so you can run all of your processes from that location.

2.  Use get-wmiobject win32_service instead of get-service.  It’s more flexible.

3.  When you collect your data, just save it to a table somewhere instead of alerting on it right away.  In other words, there should be a collection and a separate alerting mechanism.

   *** Why you ask?  Well I’m glad you asked, because not asking something that should be asked is like asking something that shouldn’t be asked but in reverse.  Anyway though… I prefer to get a single alert on all my boxes at once instead of an alert for each box, or for each service.  And that kind of grouping is much easier to do in T-SQL than in PS.  Also, there may happen a time when a service is down for a reason and you don’t want to get alerts on it but you still want to get alerts on the rest of the environment.  This is easier to do in T-SQL as well.  And finally, you may want to also attempt to start the services that are down and that really should be a separate process so you can control it better.  Or you may just want to be alerted and handle them manually.  Again, what if the service is supposed to be down for some reason, you certainly don’t want the collection process going out and restarting it for you.  And the collection can be a nice way to make sure you remember to turn the service back on when you’re done with whatever you were doing.  You’ll get an alert saying it’s down, and you’ll be all like, oh y, I totally forgot to turn that back on and my backups aren’t kicking off.  All the same, you really do want the collection, alerting, and action processes to be separated.  But that’s just me, you do what you want. ***

4.  Keep history of that status of the services.  You can look back over the last few months and see which ones have given you the most trouble and you can then try to discover why.  It’s good info to have and you may not realize how much time you’re spending on certain boxes until you see it written down like that.

Have you had a chance to do this?

Users can really stick it to you sometimes and today I got caught on a call with a user who I suppose was trying to stick it to me somehow. There were about 9 of us on the call and she asked me:
Have you had a chance to look at the old backup files yet?

And I was like, what old backup files?
She said, I wanted you to go through and make sure there aren’t any old backup files sitting out there from 3yrs ago taking up space. We’ll want all that stuff cleaned up before we switch servers.

I said, well was I supposed to do that? Is that something you’ve asked me to do?
She said, no, I was just wondering if you’ve had a chance to do it yet.
I said, yeah I’ve had many chances, but since nobody has asked me to then I haven’t done it. I’ve also had many chances to eat bananas and grind up cherries with a sharpie, but nobody’s asked me to do that either.
Would you like me to go look for extra backup files?

Yes, she said.

I said, ok then why didn’t you just ask me to instead of trying to make it look like I’m not doing something I’m supposed to do? I’ve had many chances to do many things including delete all the data in your system, but since nobody’s asked me to, I haven’t done that.

Well, she said, I was just trying to find out if you’ve had a chance to do that yet.
Yes, I get that, but you haven’t asked me to do it yet so how could I have had a chance? Now you in here trying to make it look like you’ve asked me several times and I’ve just been slacking.

OK, so that’s basically how it went. Dealing with users sometimes can really be hard… especially when they forget a key component and then try to cover it up like it’s been on your tasklist forever. And something this trivial is just ridiculous. It takes 5mins to check for old backups and admitting that you forgot until now is no big deal. Sometimes we spend more time on CYA than actual work.

Data Explosion

Hey everybody… it’s been a while since I’ve blogged, but that’s what starting a new gig will do to you. I’ve had so many new things to rant about here and I’m sure I’ll be getting to those in the future, but right now I want to share a funny story with you.

We had a group of devs who were trying to run a fairly simple insert and it kept filling up the disk. And when I say it was filling up the disk, i mean like 50K rows were filling up like 200GB. So they came to me to see if I could fix it. This is where it gets fun.

It really didn’t take me too long in my investigation to find that they had the data file set to autogrow by 131,000%. That’s right, I said it! 131,000 percent. So now that I found the problem I was able to set it to something more reasonable, shrink the file and let them get on with their insert.

So it started me thinking about what other DBs in the place had similar issues because I’ve been hearing about disk space problems here and there. So I wrote a powershell to go out and check the file growth rates for every SQL box and I found a lot of the same type of stuff. There were a lot of them set for several thousand percent growth, several of them set for 1MB growth, and everything in between. In general, as a good generic setting to start from, I like to grow my files 1GB at a time. It’s a good round number that works in a lot of cases and then you can snipe the ones that need something else. And then I altered the powershell to go out and change the growth rates of all the DB files out there to 1GB. Life is good again and we have the beginnings of a happy shiny environment.

Oh y, and in the same discovery I also found 110 DBs set to autoshrink and I also took care of those.