Remix! Optimized: Query by Hour, Day, Week, or Month

I originally posted this on 11/16/09. I repost here for the joy of TSQL Tuesday, and to add a very important change to the original solution.

Here's the accompanying video – same material, different format. The video doesn't include the change, by the way.

A solution is ever so much more elegant if you know the path that led to it.  Which is another way of saying, you only know how totally smart I am if you see how much work I had to do 🙂

I had to do a lot of work today – we have a stored procedure that's being called a few times a second, and maxing out 8 processors on our beefy test server during a moderate load test.  So I dug in and had some fun. When it came to the final fine tuning, I was looking at a query full of date maninpulation.  That sort of thing is performance death, so I stepped through it.  We started with something like this, a query that allows the user to get a SUM by hour, day, week, or month.  Here we'll just show week and month for berevity:

SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND (

— Interval: Week

( @Interval = 'wk'
AND DATEPART(wk, o.OrderDate) = DATEPART(wk, GetDate())

AND DATEPART(yyyy, o.OrderDate) = DATEPART(yyyy, GetDate())

)

— Interval: Month

OR ( @Interval = 'mm'
AND DATEPART(mm, o.OrderDate) = DATEPART(mm, GetDate())

AND DATEPART(yyyy, o.OrderDate) = DATEPART(yyyy, GetDate()) 

)
)
GROUP BY O.BizID

Principle #1: Free the Functions 

Right now we're looking disdainfully at all those DATEPARTs and GetDates().   Principle #1: Get functions out of the WHERE clause, if possible.  That's easy enough, we can declare @wk @mm and @yyyy variables and set them before our query. Let's also change this from OR to IF (for more on that, see my previous blog IF is better than OR), :

SELECT @wk =  DATEPART(wk, GetDate()) , @mm =  DATEPART(mm, GetDate()) , @yyyy =  DATEPART(yyyy, GetDate())

— Interval: Week 
IF @Interval = 'wk'

SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND DATEPART(wk, o.OrderDate) = @wk

AND DATEPART(yyyy, o.OrderDate) = @yyyy
GROUP BY O.BizID
— Interval: Month
ELSE IF @Interval = 'mm'

SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND DATEPART(mm, o.OrderDate) = @mm

AND DATEPART(yyyy, o.OrderDate) = @yyyy
GROUP BY O.BizID

Principle #2: Free the Optimizer 

That'll help a lot, but we're not through yet.  It occurrs to me that this still isn't an elegant solution.  The real brainstorm comes: If we compare date part to date part, we'll never get away from DATEPART.  What's better than that?  How about comparing the whole date to another whole date, allowing the optimizer to use that column's index?  Booya!  Principle #2, let the optimizer do its job (and that includes making indexes availble). Use @DateStart and @DateEnd instead…see:

        SET @date = CONVERT(VARCHAR(10), GETDATE(), 101) — This gives us a clean date with time set at midnight, 00:00:00

— Interval: Week 

IF @Interval = 'wk'
BEGIN
                        SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dw,GETDATE()) ) + 1,@date)
                        SELECT  @dateEnd = DATEADD(dd, 7, @dateStart)
SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND o.OrderDate BETWEEN @dateStart AND @DateEnd

GROUP BY O.BizID
END
— Interval: Month
ELSE IF @Interval = 'mm'
BEGIN
               SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dd,@date) ) + 1,@date)
               SELECT  @dateEnd = DATEADD(m, 1, @dateStart)
SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND o.OrderDate BETWEEN @dateStart AND @DateEnd 
GROUP BY O.BizID
END

Principle #3: Happify Your Code 

We'll go through all that date math here in a minute, but first things first. Notice we've gotten all the funky functions and date manipulation out of our WHERE clause, where it'd mess up our friend the SQL optimizer, and placed it where it belongs: as a one-shot above the query itself.  Oh and hey, see how the two SELECT statements are identical?  Let's pull those out of the IF/ELSE and make the code more compact, readable, and easy to manage (oh hai, principle #3!):

SET @date = CONVERT(VARCHAR(10), GETDATE(), 101) — This gives us a clean date with time set at midnight, 00:00:00

— Interval: Week 

IF @Interval = 'wk'
BEGIN
                        SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dw,GETDATE()) ) + 1,@date)
                        SELECT  @dateEnd = DATEADD(dd, 7, @dateStart)
END
— Interval: Month
ELSE IF @Interval = 'mm'
BEGIN
               SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dd,@date) ) + 1,@date)
               SELECT  @dateEnd = DATEADD(mm, 1, @dateStart)
END
SELECT SUM(o.Sale) AS Sales

FROM Orders o

WHERE o.BizID = @BizID

AND o.OrderDate BETWEEN @dateStart AND @DateEnd 
GROUP BY O.BizID

Now, let's sit back and have a plate of #bacon. Wait! We still have date math, and the big change…

Date Math 

We were originally just thinking of matching month to month and year to year, for our first example.  But that's not set based logic.  To compare a date to a date, we needed a range to compare the column value to.  In our example we're always going for the current date range (e.g., THIS month, THIS week, etc).  So start with a clean, no-time @date:

SET @date = CONVERT(VARCHAR(10), GETDATE(), 101) — This gives us a clean date with time set at midnight, 00:00:00

Today that gives us 11/16/2009.  Pseudocode: Use that to get the first of the month: Today's date, minus number of days (16), plus one = @date – datepart(dd,@date) + 1 = @date + (-1 * datepart(dd,@date)) + 1 =

SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dd,@date) ) + 1,@date) 

And DateEnd is just midnight on the first day of next month, or DATEADD(m,1,@dateStart).  You can use this same kind of logic for hour:

                SELECT  @dateStart = DATEADD(hh, DATEPART(hh, GETDATE()),@date — Clean date at midnight, plus the current hour

                SELECT  @dateEnd = DATEADD(hh, 1, @dateStart— Start date plus 1 hour

And day is even simpler:

               SELECT  @dateStart = @date               
               SELECT  @dateEnd = DATEADD(dd, 1, @dateStart) 

For me, week was the tricky part, as we didn't wind up using the "week" date part at all. Pseudocode: start with the day of the week (Monday is 2, for today's example), and subtract that from today's date to get 0, add 1 to get Sunday of this week =   today – day of week(today) + 1 =

                SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dw,@date) ) + 1,@date)
                SELECT  @dateEnd = DATEADD(dd, 7, @dateStart)

Nice, huh?

One More Thing: The Big Change

That'd be perfect and beautiful and awesome, except for one thing. The way we have this set up, we'll get orders back for the start of the next interval.  For example, if we want orders for the 10:00 hour, the time for our end date is 11:00…if an order was placed at precisely 11:00:00.000, it'll return. We don't want that…we want rows for orders placed up to BUT NOT INCLUDING the start of the next hour.  To enforce this, we subtract 2 ms from our end date.

We don't subtract 1 ms from the time because of an interesting behavior of SQL Server: if you subtract 1 millisecond (SELECT DATEADD(ms,-1,'2009-01-01 12:00:00')), you'll notice that it returns '2009-01-01 12:00:00.000' instead of '2009-01-01 11:59:59.999'.  But if you subtract 2 milliseconds (SELECT DATEADD(ms,-2,'2009-01-01 12:00:00')), it returns '2009-01-01 11:59:59.997' instead of '2009-01-01 11:59:59.998'. 

The explanation can be found in Kimberly Tripp's article on Partitioned Tables and Indexes in SQL Server 2005:

"…datetime data does not guarantee precision to the millisecond. Instead, datetime data is precise within 3.33 milliseconds. In the case of 23:59:59.999, this exact time tick is not available and instead the value is rounded to the nearest time tick that is 12:00:00.000 A.M. of the following day."

So, our final solution just needs a minor modification to @endDate:

                SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dw,@date) ) + 1,@date)
                SELECT  @dateEnd = DATEADD(ms,-2,DATEADD(dd, 7, @dateStart))

The end.

Review

Principles:

  • Principle #1: Get functions out of the WHERE clause, if possible.  They're performance death.  
  • Principle #2: Let the optimizer do its job.  Make sure indexes are available, and the query can use them.
  • Principle #3: If all else remains equal, use fewer lines of code….make code more compact, readable, and easy to manage.
  • And don't forget to exclude the start of the next time interval with Dateadd (ms, -2, yourdate)

The major 2 things we did for this query were to get rid of the ORs (see IF is better than OR), and change our thinking from comparing date parts, to comparing the whole OrderDate to a range of dates. 

Eat some #bacon. The end. Really.

-Jen McCown, http://www.MidnightDBA.com