Tag Archives: reporting

How to Monitor SQL Services with Powershell

Here’s the situation…

You get a call from one of your customers saying that the log has filled up on the DB and they can’t do anything any more.  So you connect to the server and find out that the log backups haven’t been running.  So you run the backup and everything is hunkydory.  But why did it fail to run in the first place?  Well about 3secs of investigation tells you that the Agent was turned off.  Ok, you turn it back on and go on about your business.  But this isn’t the way to do things.  You don’t want your customers informing you of important conditions on your DB servers.  And you certainly don’t want Agent to be turned off. 

And while there may be some other ways to monitor whether services are running or not, I’m going to talk about how to do it in PS.  There are 2 ways to do this in PS… get-service and get-wmiobject.  Let’s take a look at each one to see how they compare.

In the old days (about 2yrs ago), when all we had was the antiquated powershell v.1, you had to use get-wmiobject for this task because get-service didn’t allow you to hit remote boxes.  All that’s changed now so you can easily run get-service against a remote box with the -computername parameter.

get-service -computername Server2

And of course it supports a comma-separated list like this:

get-service -computername Server2, Server3

And just for completeness here’s how you would sort it, because by default they’re going to be sorted by DisplayName so services from both boxes will be inter-mingled.

get-service -computername Server2, Server3 | sort -property MachineName | FT MachineName, DisplayName, Status

Ok, that was more than just sorting wasn’t it?  I added a format-table (FT) with the columns I wanted to see.  You have to put the MachineName there so you know which box you’re gong against, right?  And the status is whether it’s running or not.

Remember though that I said we were going to do SQL services, and not all the services.  So we still have to limit the query to give us only SQL services.  This too can be done in 2 ways:

get-service -computername Server2, Server3 -include “*sql*” | sort -property MachineName | FT MachineName, DisplayName, Status

get-service -computername Server2, Server3 | ?{$_.DisplayName -match “sql”} | sort -property MachineName | FT MachineName, DisplayName, Status

so here I’ve used the -include and the where-object(?).  They’ll both give you the same results, only the -include will filter the results on the remote server and the where-object will filter them on the client.  So ultimately the -include will be more efficient because you don’t have to send all that extra text across the wire only to throw it away.

And of course, you don’t have to use that inline list to go against several boxes.  In fact, I don’t even recommend it because it doesn’t scale.  For purposes of this discussion I’ll put the servers in a txt file on C:.  Here’s how you would do the same thing while reading the servers from a txt file, only this time you could very conveniently have as many servers in there as you like.  And when creating the file, just put each server on a new line like this:

Server2
Server3

So here’s the same line above with the txt file instead:

get-content C:\Servers.txt | %{get-service -computername $_ -include “*sql*” | sort -property MachineName | FT MachineName, DisplayName, Status}

This is well documented so I’m not going to explain the foreach(%) to you.

Ok, so let’s move on to the next method because I think I’ve said all I need to say about get-service.  But isn’t this just gold?

get-wmiobject

Earlier I was talking about what we did in the old days and I always used to recommend get-wmiobject because of the server limitation imposed on get-service.  However, does that mean that get-wmiobject is completely interchangable with get-service now?  Unfortunately not.  I’m going to go ahead and cut to the chase here and say that you’ll still wanna use get-wmiobject for this task most of the time… if not all of the time, because why change methods?

You’ll notice one key difference between doing a gm against these 2 methods:

get-service | gm

get-wmiobject win32_service | gm

The get-wmiobject has more methods and more properties.

And the key property we’re interested in here is the StartMode.

If you’re going to monitor for services to see which ones are stopped, it’s a good idea to know if they’re supposed to be stopped.  Or even which ones are set to Manual when they should be set to Automatic.

And for this reason I highly recommend using getwmiobject instead of get-service.

Here’s some sample code using the txt file again.

get-content C:\Servers.txt | %{get-wmiobject win32_service -computernatm $_ -filter “DisplayName like ‘%sql%’ “} | FT SystemName, DisplayName, State, StartMode -auto

Notice that the names of things change between methods too, so watch out for that.  So like MachineName changes to SystemName.  You’ll also notice that I didn’t provide you with a full working example of a complete script.  That’ll be for another time perhaps.  The script I use fits into an entire solution so it’s tough to give you just a single script w/o also giving you all the stuff that goes along with it.  And that just gets out of the scope of a single blog post.

However, I’ll leave you with these parting pieces of advice when building your service monitor.

1.  Instead of pulling the servers from a txt file, put them in a table somewhere so you can run all of your processes from that location.

2.  Use get-wmiobject win32_service instead of get-service.  It’s more flexible.

3.  When you collect your data, just save it to a table somewhere instead of alerting on it right away.  In other words, there should be a collection and a separate alerting mechanism.

   *** Why you ask?  Well I’m glad you asked, because not asking something that should be asked is like asking something that shouldn’t be asked but in reverse.  Anyway though… I prefer to get a single alert on all my boxes at once instead of an alert for each box, or for each service.  And that kind of grouping is much easier to do in T-SQL than in PS.  Also, there may happen a time when a service is down for a reason and you don’t want to get alerts on it but you still want to get alerts on the rest of the environment.  This is easier to do in T-SQL as well.  And finally, you may want to also attempt to start the services that are down and that really should be a separate process so you can control it better.  Or you may just want to be alerted and handle them manually.  Again, what if the service is supposed to be down for some reason, you certainly don’t want the collection process going out and restarting it for you.  And the collection can be a nice way to make sure you remember to turn the service back on when you’re done with whatever you were doing.  You’ll get an alert saying it’s down, and you’ll be all like, oh y, I totally forgot to turn that back on and my backups aren’t kicking off.  All the same, you really do want the collection, alerting, and action processes to be separated.  But that’s just me, you do what you want. ***

4.  Keep history of that status of the services.  You can look back over the last few months and see which ones have given you the most trouble and you can then try to discover why.  It’s good info to have and you may not realize how much time you’re spending on certain boxes until you see it written down like that.

The Untunable Database

There are some DBs that just can’t be tuned any more than they already are (or aren’t). A good example of this is an application that hits a DB and never qualifies any of its queries. They all hit with select * and no where clause. There’s really nothing you can do to increase the performance short of just throwing more spindles at it. But that’s not really what I’m thinking about right now. What I’ve got on my mind now is an application and DB that just can’t be tuned no matter what you do because the business owners don’t see the benefit of making the changes.

I saw that a lot when I first got to my current gig. We had queries doing horrendous things and taking several hours to return and nobody cared. The end users had been running these queries for years and were happy with them. They didn’t care that the server was maxed out all the time and that they had to wait 12hrs for a report to return. Now, I don’t have to tell you that as a DBA that just drives me insane. Not to mention that it gives me nothing to do. Why am I even here then?

So with that in mind, I had to go a little cowboy on them and just start making minor changes that proved my point. I really can’t stress enough that I’m against going cowboy on any DB and I don’t care who you are. But there are some instances where it’s warranted. You have to get the ball rolling somehow. And how this DB got in such bad shape was definitely their fault, but their current view wasn’t. They had just been so used to things working the way they were that they didn’t see the need to change. They got their reports more or less when they expected them and even if they had to wait a couple extra hours for them they didn’t really mind because they understood the server was busy.

So what I did was just start by indexing a couple huge #tables. I just picked a couple of the worst SPs and added a couple indexes. Then I went in and started commenting out cursors and replacing them with simple join queries. Both of these made a huge difference. Then I just sat back and waited. You really don’t want to go too far with something like this. Then when they started noticing that their 12hr queries were coming back in just a few secs, then I had their attention. I was then able to convince them to let me go even further and start really tearing into some of these SPs.

And now, for the first time ever, we’ve got a near-realtime reporting effort in our company. They’ve come a long way from ‘I don’t care if it takes 12hrs’ to ‘I have to have it now’. The problem is they still slip back into their old habits now and then. They currently want to implement an encryption solution that will take around 2mins to return for each report when the solution I suggested returns in about 2secs. And sure, 2mins isn’t really going to break the bank, but as those of you who have done a lot of query tuning should know, you have to be hungry for resources. You have to treat every single CPU tick as a drop of water in the desert. If you don’t, you’ll wake up one day and be in the middle of the same shit you were in before. You have to fight over milliseconds and squeeze every last drop of performance out of every query you can find. That’s the only way to run a real tuning effort.

But it’s amazing how politics and perception find their way into every aspect of your job.