Tag Archives: CPU

Diabetic Servers

One question I get a lot is about performance and how systems can run well for years and then suddenly just stop performing well.  That’s an understandable question and one that’s both easy and complicated to answer.

The short answer is that there’s a huge margin for error in these types of things.  However, the margin is only huge if you’re not doing anything about it.  Let me explain.

It’s like being on a diet.  When you’re watching what you eat every little bit matters.  That extra helping of fries, that bowl of ice cream, and that soda are all death to a strict diet.  Even little things can be harmful when you’re on a diet and the more strict the diet the more the little things matter.  That’s why professional athletes of all kinds watch their intake like hawks.  So in their case that extra ounce of potatoes, or that extra ounce of meat can really make a difference.  And that’s not even to mention diabetics and other people on strict medical diets.  Think about someone with severely high blood pressure.  Their diet is extremely important and the slightest wrong food can have serious blowback on their system.

Now look at someone who’s already grossly overweight.  This guy eats whatever he likes up to thousands of extra calories a day.  He eats only fried and fatty foods and eats as much of it as he likes.  So that extra helping of ice cream or that extra few fries really doesn’t matter much on top of everything else.  That’s not to say that it doesn’t have a cumulative effect, just that day to day it doesn’t matter much.  Eventually though, it will take its toll as he get heavier and heavier and starts to feel health effects from it.  So while those extra fries do eventually catch up with him, they don’t cause any real immediate effect on top of all the other stuff he’s eating.

Well, that’s much the way it is with servers too.  If you have a bunch of stuff that runs kinda poorly or just not as well as it could, it’s not really that important on a daily basis because the server itself runs slow and what’s one more mediocre process going to hurt?  So a server can run for quite a while like that and nobody will ever really notice the difference.  Part of the problem is that so few people bother to investigate better ways to do things so they get used to their DB performing slowly.  It’s not necessarily their fault and these things can sneak up on them.  Even a fairly good DBA can have wrong decisions go undiagnosed for a long time and the poor performance can sneak up on him and next thing he knows his system is just dragging.  And it’s hard to go back and find that one thing that started the whole thing.  I find typically that performance problems are systemic.  By that I mean that whatever mistake is made, is made throughout the whole system.  It’s quite often not an isolated incident unless someone new comes into a shop where things are already running smoothly.

So anyway, a server can put up with a good deal of abuse before it goes belly-up, but it will eventually happen.  What you want to try to get to is a point where you can treat your server like it’s got diabetes.  You want it on a very strict code diet where you watch every single I/O, every single MB going into RAM, every CPU cycle, etc.  On servers like this, one single process that doesn’t behave can have a noticeable effect on many other processes and you’ll know you did something wrong right away.  But if you’ve got a system that’s slow to begin with, who’s really going to notice if it’s running a little slower or if the CPU is hitting 81% instead of 76%?

This is why I’m not only stressing performance considerations on servers, but also why I’m always answering this question.

This of course has to play hand in hand with tomorrow’s post on reasonable performance.

Watch my free SQL Server Tutorials at:
http://MidnightDBA.ITBookworm.com

Read my book reviews at:
www.ITBookworm.com

Blog Author of:
Database Underground – http://www.infoworld.com/blogs/sean-mccown

Follow my Twitter:

http://twitter.com/MidnightDBA

The Untunable Database

There are some DBs that just can’t be tuned any more than they already are (or aren’t). A good example of this is an application that hits a DB and never qualifies any of its queries. They all hit with select * and no where clause. There’s really nothing you can do to increase the performance short of just throwing more spindles at it. But that’s not really what I’m thinking about right now. What I’ve got on my mind now is an application and DB that just can’t be tuned no matter what you do because the business owners don’t see the benefit of making the changes.

I saw that a lot when I first got to my current gig. We had queries doing horrendous things and taking several hours to return and nobody cared. The end users had been running these queries for years and were happy with them. They didn’t care that the server was maxed out all the time and that they had to wait 12hrs for a report to return. Now, I don’t have to tell you that as a DBA that just drives me insane. Not to mention that it gives me nothing to do. Why am I even here then?

So with that in mind, I had to go a little cowboy on them and just start making minor changes that proved my point. I really can’t stress enough that I’m against going cowboy on any DB and I don’t care who you are. But there are some instances where it’s warranted. You have to get the ball rolling somehow. And how this DB got in such bad shape was definitely their fault, but their current view wasn’t. They had just been so used to things working the way they were that they didn’t see the need to change. They got their reports more or less when they expected them and even if they had to wait a couple extra hours for them they didn’t really mind because they understood the server was busy.

So what I did was just start by indexing a couple huge #tables. I just picked a couple of the worst SPs and added a couple indexes. Then I went in and started commenting out cursors and replacing them with simple join queries. Both of these made a huge difference. Then I just sat back and waited. You really don’t want to go too far with something like this. Then when they started noticing that their 12hr queries were coming back in just a few secs, then I had their attention. I was then able to convince them to let me go even further and start really tearing into some of these SPs.

And now, for the first time ever, we’ve got a near-realtime reporting effort in our company. They’ve come a long way from ‘I don’t care if it takes 12hrs’ to ‘I have to have it now’. The problem is they still slip back into their old habits now and then. They currently want to implement an encryption solution that will take around 2mins to return for each report when the solution I suggested returns in about 2secs. And sure, 2mins isn’t really going to break the bank, but as those of you who have done a lot of query tuning should know, you have to be hungry for resources. You have to treat every single CPU tick as a drop of water in the desert. If you don’t, you’ll wake up one day and be in the middle of the same shit you were in before. You have to fight over milliseconds and squeeze every last drop of performance out of every query you can find. That’s the only way to run a real tuning effort.

But it’s amazing how politics and perception find their way into every aspect of your job.