Correlating Data in SQL Trace, Part 1

I’ve been chasing down a series of issues with a system. One of  the symptoms we’ve been getting is massive lock timeouts, so I’ve been taking (server side) profiler traces to see what code is requesting the locks that are being timed out.  Today I landed on a lovely solution for correlating the lock timeouts I see in the trace file with the associated code.  Let’s talk setup, and then we’ll walk through the thinking that led to our solution.

Setup

Take a server side trace, be sure to get “existing connections”, RPC:Starting, RPC:Completed, and Lock Timeout (Timeout > 0) (see the previous blog “Lock Timeout Revelation on Twitter”).  For the RPCs and Timeouts,  get your basic measures, plus ObjectName and DatabaseID.

Once you’ve captured enough of a trace, save it to a table, preferably on your local box. 

Depending on how much time you’re going to spend digging through this data, you may want to index your trace table.  Here are the indexes I developed for this query:

CREATE INDEX ix_trace ON trace (eventclass, spid, rownumber, databaseID, objectName) INCLUDE (starttime, endTime)

CREATE INDEX ix_trace2 ON trace (spid, eventclass, rownumber, databaseID, objectName)

Developing

Once you save the trace into a table, you'll need to know the event id (event class names aren't displayed). (*See footnote about trace in tables…) 

Trace event 189 is Lock Timeout (timeout > 0).  So when you find a 189, it’s easy to go back up the trace rows until you find the closest one with a corresponding SPID, which will have an eventclass of 11: RPC:Started.  Rough pseudocode:

1.       Find a timeout eventclass = 189

2.       Find the closest row with rownumber < timeout.rownumber and spid = timeout.spid

Part 1 is easy:

SELECT TOP 10
        [Timeout].RowNumber ,
        [Timeout].EventClass ,
        [Timeout].DatabaseID ,
        [Timeout].spid ,
        [Timeout].starttime ,
        [Timeout].endtime ,
FROM    Trace AS [Timeout]       
WHERE   [Timeout].EventClass = 189

Note that I’m limiting the query to the TOP 10 for now, just so I don’t have to look through 10,000 rows at once.

So Part 2: for each timeout, find the closest previous RPC Started with the same spid.  On first look, I started falling into the cursor trap.  You say it to yourself procedurally: “for each timeout” is actually direct pseudocode for a cursor. But don’t go there….just don’t…

What do we know about the timeout and its associated code call?

  • They have the same spid
  • They’re on the same database
  • The statement will have an RPC start (eventclass = 11), but not necessarily an RPC complete (eventclass = 10)
  • The statement will precede the timeout, and therefore have an earlier row number

So we could say that for a timeout, the corresponding statement is

  • the MAX rownum less than the timeout rownum
  • where the spids and databases match,
  • and eventclass=11. 

Well partner, that’s a correlated subquery right there! Pretend for a moment that we only have one lock timeout, SPID = 52, rownumber = 11,304.  So to find the code that goes with it:

SELECT TOP 1 rownumber
      FROM trace
WHERE spid = 52         –Timeout spid
AND rownumber < 11304   -–Timeout rownumber
AND eventclass = 11
ORDER BY rownumber DESC 

That’s our subquery. Let’s plug it into the part 1 query, and correlate the thing! 

SELECT TOP 10
        [Timeout].RowNumber ,
        [Timeout].EventClass ,
        [Timeout].DatabaseID ,
        [Timeout].spid ,
        [Timeout].starttime ,
        [Timeout].endtime ,
        Statement.*
FROM    Trace AS [Timeout]
        INNER JOIN Trace AS [Statement] ON Statement.spid = [Timeout].spid
               AND Statement.spid = [Timeout].spid
               AND Statement.DatabaseID = [Timeout].DatabaseID
               AND statement.rownumber = ( SELECT TOP 1 rownumber
                                  FROM trace
                                  WHERE spid = [Timeout].spid
                                  AND rownumber < [Timeout].rownumber
                                  AND eventclass = 11
                                  ORDER BY rownumber DESC
                                  )
WHERE   [Timeout].EventClass = 189

And finally, let’s do better with our select than just Statement.*:

SELECT TOP 10
        [Timeout].RowNumber ,
        [Timeout].EventClass ,
        [Timeout].DatabaseID ,
        [Timeout].spid ,
        [Timeout].starttime ,
        [Timeout].endtime ,
'STATEMENT:  ' AS spacer,
        Statement.rownumber ,
        Statement.objectname ,
        Statement.textdata,
        Statement.loginname ,
        Statement.spid ,
        Statement.starttime ,
        Statement.endtime
FROM    Trace AS [Timeout]
        INNER JOIN Trace AS [Statement] ON Statement.spid = [Timeout].spid
               AND Statement.spid = [Timeout].spid
               AND Statement.DatabaseID = [Timeout].DatabaseID
               AND statement.rownumber = ( SELECT TOP 1 rownumber
                                  FROM trace
                                  WHERE spid = [Timeout].spid
                                  AND rownumber < [Timeout].rownumber
                                  AND eventclass = 11
                                  ORDER BY rownumber DESC
                                  )
WHERE   [Timeout].EventClass = 189

I like the “STATEMENT” spacer; it’s a nice break between your timeout columns, and your statement columns.

I found another great application for this, but we’ve gone a bit long here. So, come back tomorrow for Correlating Data in SQL Trace, Part 2!

-Jen McCown

http://www.MidnightDBA.com

 

*Footnote: Again, I'll have to come back later with a link, but I also learned today that when you port your trace results into a table, duration is stored in microseconds (1/1,000,000 s), instead of milliseconds (1/1,000).  This seems like it would be pretty good info to have.   Edit: Linky! Here's the Brian Moran article "Duration and CPU Values" that mentions this.

Lock Timeout Revelation on Twitter

I was all proud of myself today, and bragged a little bit on Twitter about a query I'd written.  That query gets its own blog here shortly, but first, a worthwhile Twitter conversation with Adam Machanic (@AdamMachanic on Twitter, Adam's blog is on SQLBlog.com) that turned very, very useful:

Me: Okay, that cool query from yesterday that correlates lock timeouts to their queries? Yah, just modified to find… (1/2) 
Me: …RPC start statements that have no correlated RPC complete statements within a timeframe. Shows what's bunging up the CPU.(2/2) Will blog!

Adam: Why are you hitting lock timeouts on a regular enough basis to need such a query?

Me: BizTalk transactions + UPDLOCK on SELECT

Adam: BizTalk overrides the lock timeout setting?

Me: I don't think so (not sure), but we've got maxed CPU and 2 SPs that run again & again, lotsa waiting & timeouts.

Adam: It sounds like you're referring to "client" timeouts, not "lock" timeouts, if I'm understanding what you're talking about.

Me: In a SS-profiler trace, we have tons of lock timeouts (eventclass 27) showing up. I can associate those w/the SPIDs (1/2)
Me: …that were holding the locks when they timed out. Tells me what's causing the lock contention, which I suspect is (ok 2/3)
Me: contributing to the CPU maxout. Sound crazy? (3/3)

Adam: Ah, I see what you're doing. I don't think a lock probe is very expensive but I guess if you're getting millions it adds up.

Me: Suppose emphasis should be more on the high CPU ,& longrunning queries that're CAUSING the timeouts, not the timeouts thmselvs

Adam: Is the duration on the events > 0? If not, they're not 'really' timeouts, just an indication of a lock wait.

Me: HOLY MOTHER OF POTATOES! They're all waits…I so didn't know that, thx

Adam: Glad to help 🙂

Me: I even RTFM, and BOL doesn't make that clear. I can see now how the article would mean that. But now I hav acheived clarity!!

Adam: Glad to help 🙂
Adam: "I was just sitting here, eating my muffin, drinking my coffee, when I had what alcoholics refer to as a moment of clarity."


After I wrote the "HOLY MOTHER OF POTATOES" line, I went and looked up Lock Timeouts in BOL.  And yes, it does mention this, but it's terribly cryptic to the untrained eye (like mine):

Lock:Timeout events with a duration of 0 are commonly the result of internal lock probes and are not necessarily an indication of a problem. The Lock:Timeout (timeout > 0) event can be used to ignore time-outs with a duration of 0.

Yeah, I read that two years ago, and disregarded it as making no sense. 

So there's our tip for the day! Lock Timouts with duration = 0 aren't really timeouts.  And, it's a good segue into the next blog, so…

-Jennifer McCown

http://www.MidnightDBA.com

2009 Really Knew Where its Towel Was

I've decided to jump on the new year blog post bandwagon.  2009 was one excellent year for MidnightDBA.com, and for the MidnightDBAs.

MidnightDBA.com

We'll start with me.  In 2009, I started blogging on MidnightDBA.com. I got my first SQL 2008 certification, the MCTS in Database Development. My goal is to get MCITP in the next couple of months, by the way.  In August I started back to work, and in October I won the SQLServerPedia contest…that got me a trip to the SQLPASS Summit in Seattle!  Just a couple of personal notes, and then back to the geeking out: Sean and I celebrated our 5th anniversary this year with a great trip to Los Angeles, and my sweet kidlets turned 10, 4, and 2.  I'm also apparently taking up the guitar (got one for xmas), which I've always wanted to do.

Sean had a huge year: In January, he was awarded his first Microsoft MVP award!!  (Last week, he was awarded his second!)  He did his first OReilly webcast, Ground-Zero PowerShell for the SQL Server DBA (I'll find the YouTube link later, when it's not blocked).  And he completed his first set of training videos for TrainingSpot.com: Beginning Powershell for DBAs 101.  He's still writing (SQL R2 article) and blogging at InfoWorld.com. In August, “PR In Your Pajamas” interviewed Sean!

On a personal note: Sean started working from home this year, which means he gets to stay home with our littlest.  I'm incredibly lucky to have stayed home with our kids, and that the youngest gets to stay home with his Dad instead of us farming him out to friends or daycare.  I'm not being all LIKE THAT, I'm only saying that in my case, I'm much happier this way.

As for MidnightDBA: Sean did a complete overhaul of the site functionality and look last year, using a SQL Server backend and PowerShell in the push process.  We started the new video format "DBAs at Midnight", put up a bunch of goofy shirts on Zazzle.com, and started publishing our training videos to iTunes (search for Midnight DBA).  I jumped on Twitter as a marketing tool for MidnightDBA, and found much, much more.  We had a ready-made set of friends and contacts when we hit PASS in November, and it even helped us coordinate our PASS interviews.  If you haven't seen our PASS interviews, you are totally missing out…a bunch of the biggest names sat down and got all hoopy with us.

The MidnightDBAs

One of the weirdest things about 2009 was that I became the face of MidnightDBA for a lot of people. I think that's because I play the role of publicist, posting updates on Twitter and blogging on MidnightDBA proper.  During PASS, I updated the Twitter avatar to a picture of me and Sean, just so people would start realizing that I'm not THE MidnightDBA.

ITBookworm.com

To put it a better way: Sean started ITBookworm and MidnightDBA years ago, and I've only had much to do with it for the last 10 months or so.  Fully 80% of the content out there is his, the idea was his, and all the development was his (okay, I mostly designed the db). It's been a great experience to join forces and be a MidnightDBA. Another place I'm lucky is in having a partner who I can work with, be creative with, and have fun with.  So to get all gooshy in public: Sean, thanks for everything.  2009 was great because of you.

Okay, let's walk that off.

In December, we started a new podcast called the MidnightDBA Blogcast, where Sean or I (mostly I) read our blogs, or a guest blog.  We've only published one episode, and the Blogcast is already getting good reviews.  I'm thrilled.

I'm happy.  Happy with my family, my friends, my Tweeps, my career, and everything that goes along with it.  Cheers to you, 2009.  2010: Happy days.

-Jen McCown

http://www.MidnightDBA.com