Optimized: Query by Hour, Day, Week, or Month

Here's the accompanying video – same material, different format.

A solution is ever so much more elegant if you know the path that led to it.  Which is another way of saying, you only know how totally smart I am if you see how much work I had to do 🙂

I had to do a lot of work today – we have a stored procedure that's being called a few times a second, and maxing out 8 processors on our beefy test server during a moderate load test.  So I dug in and had some fun. When it came to the final fine tuning, I was looking at a query full of date maninpulation.  That sort of thing is performance death, so I stepped through it.  We started with something like this, a query that allows the user to get a SUM by hour, day, week, or month.  Here we'll just show week and month for berevity:

SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND (

— Interval: Week

( @Interval = 'wk'
AND DATEPART(wk, o.OrderDate) = DATEPART(wk, GetDate())

AND DATEPART(yyyy, o.OrderDate) = DATEPART(yyyy, GetDate())

)

— Interval: Month

OR ( @Interval = 'mm'
AND DATEPART(mm, o.OrderDate) = DATEPART(mm, GetDate())

AND DATEPART(yyyy, o.OrderDate) = DATEPART(yyyy, GetDate()) 

)
)
GROUP BY O.BizID

Principle #1: Free the Functions 

Right now we're looking disdainfully at all those DATEPARTs and GetDates().   Principle #1: Get functions out of the WHERE clause, if possible.  That's easy enough, we can declare @wk @mm and @yyyy variables and set them before our query. Let's also change this from OR to IF (for more on that, see my previous blog IF is better than OR), :

SELECT @wk =  DATEPART(wk, GetDate()) , @mm =  DATEPART(mm, GetDate()) , @yyyy =  DATEPART(yyyy, GetDate())

— Interval: Week 
IF @Interval = 'wk'

SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND DATEPART(wk, o.OrderDate) = @wk

AND DATEPART(yyyy, o.OrderDate) = @yyyy
GROUP BY O.BizID
— Interval: Month
ELSE IF @Interval = 'mm'

SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND DATEPART(mm, o.OrderDate) = @mm

AND DATEPART(yyyy, o.OrderDate) = @yyyy
GROUP BY O.BizID

Principle #2: Free the Optimizer 

That'll help a lot, but we're not through yet.  It occurrs to me that this still isn't an elegant solution.  The real brainstorm comes: If we compare date part to date part, we'll never get away from DATEPART.  What's better than that?  How about comparing the whole date to another whole date, allowing the optimizer to use that column's index?  Booya!  Principle #2, let the optimizer do its job (and that includes making indexes availble). Use @DateStart and @DateEnd instead…see:

        SET @date = CONVERT(VARCHAR(10), GETDATE(), 101) — This gives us a clean date with time set at midnight, 00:00:00

— Interval: Week 

IF @Interval = 'wk'
BEGIN
                        SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dw,GETDATE()) ) + 1,@date)
                        SELECT  @dateEnd = DATEADD(dd, 7, @dateStart)
SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND o.OrderDate BETWEEN @dateStart AND @DateEnd

GROUP BY O.BizID
END
— Interval: Month
ELSE IF @Interval = 'mm'
BEGIN
               SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dd,@date) ) + 1,@date)
               SELECT  @dateEnd = DATEADD(m, 1, @dateStart)
SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND o.OrderDate BETWEEN @dateStart AND @DateEnd 
GROUP BY O.BizID
END

Principle #3: Happify Your Code 

We'll go through all that date math here in a minute, but first things first. Notice we've gotten all the funky functions and date manipulation out of our WHERE clause, where it'd mess up our friend the SQL optimizer, and placed it where it belongs: as a one-shot above the query itself.  Oh and hey, see how the two SELECT statements are identical?  Let's pull those out of the IF/ELSE and make the code more compact, readable, and easy to manage (oh hai, principle #3!):

SET @date = CONVERT(VARCHAR(10), GETDATE(), 101) — This gives us a clean date with time set at midnight, 00:00:00

— Interval: Week 

IF @Interval = 'wk'
BEGIN
                        SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dw,GETDATE()) ) + 1,@date)
                        SELECT  @dateEnd = DATEADD(dd, 7, @dateStart)
END
— Interval: Month
ELSE IF @Interval = 'mm'
BEGIN
               SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dd,@date) ) + 1,@date)
               SELECT  @dateEnd = DATEADD(mm, 1, @dateStart)
END
SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND o.OrderDate BETWEEN @dateStart AND @DateEnd 
GROUP BY O.BizID

Now, let's sit back and have a plate of #bacon. Oh yeah, and let's go over that date math.

Date Math 

We were originally just thinking of matching month to month and year to year, for our first example.  But that's not set based logic.  To compare a date to a date, we needed a range to compare the column value to.  In our example we're always going for the current date range (e.g., THIS month, THIS week, etc).  So start with a clean, no-time @date:

SET @date = CONVERT(VARCHAR(10), GETDATE(), 101) — This gives us a clean date with time set at midnight, 00:00:00

Today that gives us 11/16/2009.  Pseudocode: Use that to get the first of the month: Today's date, minus number of days (16), plus one = @date – datepart(dd,@date) + 1 = @date + (-1 * datepart(dd,@date)) + 1 =

SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dd,@date) ) + 1,@date) 

And DateEnd is just midnight on the first day of next month, or DATEADD(m,1,@dateStart).  You can use this same kind of logic for hour:

                SELECT  @dateStart = DATEADD(hh, DATEPART(hh, GETDATE()),@date — Clean date at midnight, plus the current hour

                SELECT  @dateEnd = DATEADD(hh, 1, @dateStart— Start date plus 1 hour

And day is even simpler:

               SELECT  @dateStart = @date               
               SELECT  @dateEnd = DATEADD(dd, 1, @dateStart) 

For me, week was the tricky part, as we didn't wind up using the "week" date part at all. Pseudocode: start with the day of the week (Monday is 2, for today's example), and subtract that from today's date to get 0, add 1 to get Sunday of this week =   today – day of week(today) + 1 =

                SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dw,@date) ) + 1,@date)
                SELECT  @dateEnd = DATEADD(dd, 7, @dateStart)

Nice, huh?

Review

Principles:

  • Principle #1: Get functions out of the WHERE clause, if possible.  They're performance death.  
  • Principle #2: Let the optimizer do its job.  Make sure indexes are available, and the query can use them.
  • Principle #3: If all else remains equal, use fewer lines of code….make code more compact, readable, and easy to manage.

The major 2 things we did for this query were to get rid of the ORs (see IF is better than OR), and change our thinking from comparing date parts, to comparing the whole OrderDate to a range of dates. 

Eat some #bacon. The end.

-Jen McCown, http://www.MidnightDBA.com