An interesting log truncation case

Here’s a good one for you guys. I got a call from one of my DBAs today that they’re having trouble with the logs on one server not truncating. They’ve brought the server space to a critical level, and it needs to be fixed. The problem is that the DBA had looked at the log backups and they were fine. The log backup job was running just fine, and there were no active transactions.

I asked if there were any DBs being replicated… no.
OK, I’ll look into it.

So I connect and start my investigation where I always do for log truncation problems.
SELECT name, log_reuse_wait_desc from sys.databases

This tells me exactly what the log is waiting for before it can truncate. It’s really a matter what what your goal is. Some people love the troubleshooting process so much that they want to drag it out as long as they can. So they hunt around looking at different things to try to find what the problem is. They gather evidence and try different things until the problem is solved. I’m not that way though. I want to find the issue ASAP and get on to other things. So if you really enjoy the hunt then keep doing what you’re doing. But if you want to solve the problem quickly, then use the above query and get on with your life.

So anyway, the log was waiting on a log backup. I then checked the log backup job and it had been running just fine all day. So ok, the log is being backed up, but SQL still thinks that it’s waiting on a log backup before it can truncate the log.

This is where a knowledge of SQL can come in. At this point there are only 2 things that could architect this situation. Because all the DBs on the server are in the same boat with their logs filling up. So either the backup isn’t actually running, or it’s running with the copy_only flag.

A quick look at the job properties told me exactly what I needed to know. Looking at the execution time of the different runs, the job finishes in about 10secs. That seems a little fast for me on a server that has like 200 DBs on it. So looking back at the history 2 days ago the job was taking 9mins.

At this point, I knew exactly what the problem was. I looked in the SP that runs the backup, and the line that actually runs the backup had been commented out. Someone was trying to make a simple change and commented out the wrong line in the SP.

Look, I’m not smarter than your average DBA, and I’m not necessarily better educated. What I do however, is follow a predictable pattern of troubleshooting, and actually pay attention to the evidence. Sometimes the evidence isn’t clear and you have to make guesses, but most of the time it is.
I understand that the thrill of the hunt keeps you going, but some things are easy enough that you should just get them done. Why a log won’t truncate is such an easy thing to diagnose it should just be commonplace for you. Search somewhere else for your troubleshooting fix.

3 thoughts on “An interesting log truncation case”

  1. I love the sys.databases view– that thing is so darn useful.

    It’s funny, one of the first things I typically do with anything that involves problems with a tlog for a database in full recovery is to simply look at the backup files themselves. The SQL Agent job history is just full of lies at times, so I want to see those files! Not only that they’re there, but also whether or not the sizes are normal are not. And if the files aren’t where they’re supposed to be, well then, that’s a problem, too. 🙂

Comments are closed.