PASS Summit Homework Challenge

Tim Ford (blog, Twitter) tagged me thusly:

What have you implemented from the PASS Summit 2009?  Did you find yourself using the code available without tweaks?  If not, what did you have to do in order to get the code to function in your environment?  What do you still intend to implement? I ask these questions of Jeremiah Peschka (twitter | web), Andy Leonard (twitter | web), and Jen and Sean McCown (twitter | web).  I ask that they tag 2-3 more individuals with their response and link back to this original post.  The goal is to show both first-time Summit Attendees and Veterans alike (and their Managers/Directors) that there is value in what you bring back from the Summit in that wetware you call a brain.

I put this off till evening, and read it to Sean….and we said, well, we only got to 4 sessions between the two of us.  And while I still intend to go back and read my notes from Buck Woody's and Grant Fritchey's sessions – and in fact I have downloaded both session slides, and referenced them (especially on DMVs) – I can't say I've implemented code.  Not the advertisement we might've expected, eh?

However…you know how everyone who came back from #SQLPASS (that's right, I hashtagged it in a blog, baby) keeps banging on about networking, and friends, and meeting people, blahdy blah?  Yeah.  Since I've been back, I've tagged my SQLPass peeps 6 or 8 times on different SQL issues.  I got a great resource through Denny Cherry (@mrdenny) – authored by Bob Ward (@bobwardms) on AWE and 64-bit, got confirmation from Kim Tripp (@KimberlyLTripp) about nested views, and got a course syllabus from Buck Woody (@BuckWoody).  I attended my first SQL Lunch today.  I've also friended six new people on Facebook, all from the conference, and we're trading photos and memories. I announced the confirmed date for my user group's 2010 SQL Saturday, and almost instantly – through followers I gained at the Summit – got an inquiry about sponsorship.

What have I implemented?  Community.  Thanks, guys.

Tagbacks!  Let's see, how about Colin Stasiuk (@BenchmarkIT), Tim Mitchell (@Tim_Mitchell), and Kendal Van Dyke (@SQLDBA).

-Jen McCown, http://www.MidnightDBA.com

(Oh, and if that's too much of a cop-out answer, I swear I'll post again after I go through my notes and/or get the summit DVDs. 🙂

Optimized: Query by Hour, Day, Week, or Month

Here's the accompanying video – same material, different format.

A solution is ever so much more elegant if you know the path that led to it.  Which is another way of saying, you only know how totally smart I am if you see how much work I had to do 🙂

I had to do a lot of work today – we have a stored procedure that's being called a few times a second, and maxing out 8 processors on our beefy test server during a moderate load test.  So I dug in and had some fun. When it came to the final fine tuning, I was looking at a query full of date maninpulation.  That sort of thing is performance death, so I stepped through it.  We started with something like this, a query that allows the user to get a SUM by hour, day, week, or month.  Here we'll just show week and month for berevity:

SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND (

— Interval: Week

( @Interval = 'wk'
AND DATEPART(wk, o.OrderDate) = DATEPART(wk, GetDate())

AND DATEPART(yyyy, o.OrderDate) = DATEPART(yyyy, GetDate())

)

— Interval: Month

OR ( @Interval = 'mm'
AND DATEPART(mm, o.OrderDate) = DATEPART(mm, GetDate())

AND DATEPART(yyyy, o.OrderDate) = DATEPART(yyyy, GetDate()) 

)
)
GROUP BY O.BizID

Principle #1: Free the Functions 

Right now we're looking disdainfully at all those DATEPARTs and GetDates().   Principle #1: Get functions out of the WHERE clause, if possible.  That's easy enough, we can declare @wk @mm and @yyyy variables and set them before our query. Let's also change this from OR to IF (for more on that, see my previous blog IF is better than OR), :

SELECT @wk =  DATEPART(wk, GetDate()) , @mm =  DATEPART(mm, GetDate()) , @yyyy =  DATEPART(yyyy, GetDate())

— Interval: Week 
IF @Interval = 'wk'

SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND DATEPART(wk, o.OrderDate) = @wk

AND DATEPART(yyyy, o.OrderDate) = @yyyy
GROUP BY O.BizID
— Interval: Month
ELSE IF @Interval = 'mm'

SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND DATEPART(mm, o.OrderDate) = @mm

AND DATEPART(yyyy, o.OrderDate) = @yyyy
GROUP BY O.BizID

Principle #2: Free the Optimizer 

That'll help a lot, but we're not through yet.  It occurrs to me that this still isn't an elegant solution.  The real brainstorm comes: If we compare date part to date part, we'll never get away from DATEPART.  What's better than that?  How about comparing the whole date to another whole date, allowing the optimizer to use that column's index?  Booya!  Principle #2, let the optimizer do its job (and that includes making indexes availble). Use @DateStart and @DateEnd instead…see:

        SET @date = CONVERT(VARCHAR(10), GETDATE(), 101) — This gives us a clean date with time set at midnight, 00:00:00

— Interval: Week 

IF @Interval = 'wk'
BEGIN
                        SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dw,GETDATE()) ) + 1,@date)
                        SELECT  @dateEnd = DATEADD(dd, 7, @dateStart)
SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND o.OrderDate BETWEEN @dateStart AND @DateEnd

GROUP BY O.BizID
END
— Interval: Month
ELSE IF @Interval = 'mm'
BEGIN
               SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dd,@date) ) + 1,@date)
               SELECT  @dateEnd = DATEADD(m, 1, @dateStart)
SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND o.OrderDate BETWEEN @dateStart AND @DateEnd 
GROUP BY O.BizID
END

Principle #3: Happify Your Code 

We'll go through all that date math here in a minute, but first things first. Notice we've gotten all the funky functions and date manipulation out of our WHERE clause, where it'd mess up our friend the SQL optimizer, and placed it where it belongs: as a one-shot above the query itself.  Oh and hey, see how the two SELECT statements are identical?  Let's pull those out of the IF/ELSE and make the code more compact, readable, and easy to manage (oh hai, principle #3!):

SET @date = CONVERT(VARCHAR(10), GETDATE(), 101) — This gives us a clean date with time set at midnight, 00:00:00

— Interval: Week 

IF @Interval = 'wk'
BEGIN
                        SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dw,GETDATE()) ) + 1,@date)
                        SELECT  @dateEnd = DATEADD(dd, 7, @dateStart)
END
— Interval: Month
ELSE IF @Interval = 'mm'
BEGIN
               SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dd,@date) ) + 1,@date)
               SELECT  @dateEnd = DATEADD(mm, 1, @dateStart)
END
SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND o.OrderDate BETWEEN @dateStart AND @DateEnd 
GROUP BY O.BizID

Now, let's sit back and have a plate of #bacon. Oh yeah, and let's go over that date math.

Date Math 

We were originally just thinking of matching month to month and year to year, for our first example.  But that's not set based logic.  To compare a date to a date, we needed a range to compare the column value to.  In our example we're always going for the current date range (e.g., THIS month, THIS week, etc).  So start with a clean, no-time @date:

SET @date = CONVERT(VARCHAR(10), GETDATE(), 101) — This gives us a clean date with time set at midnight, 00:00:00

Today that gives us 11/16/2009.  Pseudocode: Use that to get the first of the month: Today's date, minus number of days (16), plus one = @date – datepart(dd,@date) + 1 = @date + (-1 * datepart(dd,@date)) + 1 =

SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dd,@date) ) + 1,@date) 

And DateEnd is just midnight on the first day of next month, or DATEADD(m,1,@dateStart).  You can use this same kind of logic for hour:

                SELECT  @dateStart = DATEADD(hh, DATEPART(hh, GETDATE()),@date — Clean date at midnight, plus the current hour

                SELECT  @dateEnd = DATEADD(hh, 1, @dateStart— Start date plus 1 hour

And day is even simpler:

               SELECT  @dateStart = @date               
               SELECT  @dateEnd = DATEADD(dd, 1, @dateStart) 

For me, week was the tricky part, as we didn't wind up using the "week" date part at all. Pseudocode: start with the day of the week (Monday is 2, for today's example), and subtract that from today's date to get 0, add 1 to get Sunday of this week =   today – day of week(today) + 1 =

                SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dw,@date) ) + 1,@date)
                SELECT  @dateEnd = DATEADD(dd, 7, @dateStart)

Nice, huh?

Review

Principles:

  • Principle #1: Get functions out of the WHERE clause, if possible.  They're performance death.  
  • Principle #2: Let the optimizer do its job.  Make sure indexes are available, and the query can use them.
  • Principle #3: If all else remains equal, use fewer lines of code….make code more compact, readable, and easy to manage.

The major 2 things we did for this query were to get rid of the ORs (see IF is better than OR), and change our thinking from comparing date parts, to comparing the whole OrderDate to a range of dates. 

Eat some #bacon. The end.

-Jen McCown, http://www.MidnightDBA.com

IF is better than OR

Here's the accompanying video – same material, different format. 

Here's a common mistake: using OR instead of IF.  Let's say you want to get sales by a time interval, either current week or current month.  This will not perform well:

IF @Interval = 'wk'

SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND (

— Interval: Week

( @Interval = 'wk'

AND DATEPART(wk, o.OrderDate) = @wk

AND DATEPART(yyyy, o.OrderDate) = @yyyy

)

— Interval: Month

OR ( @Interval = 'mm'

AND DATEPART(mm, o.OrderDate) = @mm

AND DATEPART(yyyy, o.OrderDate) = @yyyy

)

)

GROUP BY O.BizID

This will perform much better:

— Interval: Week

IF @Interval = 'wk'

SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND DATEPART(wk, o.OrderDate) = @wk

AND DATEPART(yyyy, o.OrderDate) = @yyyy

GROUP BY O.BizID

ELSE

— Interval: Month

IF @Interval = 'mm'

SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND DATEPART(mm, o.OrderDate) = @mm

AND DATEPART(yyyy, o.OrderDate) = @yyyy

GROUP BY O.BizID

OR always widens the scope of the query; the optimizer has to walk through each and every OR statement to evaluate if it's true.  When you use IF, the optimizer immediately figures out what's true, and only has to deal with the one query, OR-free.

OR is not the same as IF.  Avoid OR when you can, and your queries will thank you.

-Jen McCown, http://www.MidnightDBA.com