Search Argument Case Study: WHERE the year was 2002…

Content Rating: Beginner, Tips

It’s entirely possible that you’re fiddling with your search arguments too much, and missing out on the boost a good index gives your query.  Here’s a quick demonstration of the power of a well-thought-out search clause on a date column.  For this example we’ll use AdventureWorks (freely available at Codeplex).

Let’s say we want to “SELECT SubTotal, VendorID FROM Purchasing.PurchaseOrderHeader AS POH”  for the year 2002. What’s the best WHERE clause for this statement?  Well, that kind of depends on the indexes, doesn’t it?

This query gives you all the indexes for this table (and lists the indexed and included columns):

SELECT
      I.name as IndexName
    , I.type_desc
    , C.name
    , IC.column_id
    , IC.is_included_column
FROM sys.indexes I
INNER JOIN sys.index_columns IC ON IC.index_id = I.index_id AND IC.object_id = i.object_id
INNER JOIN sys.columns C ON C.column_id = IC.column_id AND C.object_id = I.object_id
WHERE OBJECT_NAME(I.OBJECT_ID) = 'PurchaseOrderHeader'
    -- AND I.name = 'IX_PurchaseOrderHeader_VendorID'   -- You can also view just one index
ORDER BY I.index_id, is_included_column, IC.column_id; 

There isn’t currently an index that applies to an OrderDate search – we need an index whose first non-included column is OrderDate. Let’s assume we’re in a position to create a covering index for our query…that would look like this:

CREATE NONCLUSTERED INDEX ix_PurchaseOrderHeader_OrderDate
ON [Purchasing].[PurchaseOrderHeader] ([OrderDate])
INCLUDE (SubTotal, VendorID);

Now that we have a helpful covering index, we’re back to the question…what’s the best way to limit our query to just the year 2002?  Here are a few options:

  • WHERE CONVERT(VARCHAR(4), POH.OrderDate, 101) LIKE ‘%2002%’
    Always be wary of using a leading wildcard to search for a string; it means that SQL can’t use any applicable indexes. Think about it: I’ll hand you a copy of the SQL Server 2008 Bible (1,628 pages long), and you find me every entry in the index that contains the word “level” (‘%level%’).  That’s going to take a bit longer than finding every entry that STARTS with “level” (‘level%’). 
  • WHERE RIGHT(CONVERT(VARCHAR(4), POH.OrderDate, 101), 4) = ‘2002’
    Be very suspicious of searche clauses that convert one datatype to another, especially from something numerically based (like dates) to a string type.  Our first solution, above, has this same shortcoming.
  • WHERE YEAR(POH.OrderDate) = 2002
    This looks nifty, sure…but it still causes SQL to scan the index provided; this is why we don’t like using functions in WHERE and JOIN clauses where we can avoid it.  Also note: “WHERE DatePart(yy, POH.OrderDate) = 2002” behaves the exact same way and causes an index scan…it’s still a function in a WHERE clause!
  • WHERE POH.OrderDate BETWEEN ‘1/1/2002 00:00:00’ AND ‘1/1/2003 23:59:59’
    Ahhh, that’s better! Notice we’re treating a date as a date. THIS function causes an index seek.  SQL loves its datatypes unsullied wherever it can get them.

 The first two solutions cause a table scan (rather, a clustered index scan, which amounts to the same thing), and the third causes a nonclustered index scan.  The last solution allows SQL to make use of an appropriate index, and it’s an improvement of over 400% efficiency when compared to any of the other implementations.

Moral: When possible, use datatypes in serach conditions as Codd intended – organic and efficient.

Happy days,
Jen McCown
http://www.MidnightDBA.com/Jen

2 thoughts on “Search Argument Case Study: WHERE the year was 2002…

  1. Eric Humphrey

    Always a good tip. You can also use the other alternative where you don’t have to worry about the time portion on the last day. WHERE POH.OrderDate >= ’1/1/2002′ AND POH.OrderDate < ’1/1/2003′.

  2. John Halunen

    Great blog, but I have a question. Shouldn’t the where clause be:
    WHERE POH.OrderDate BETWEEN ’1/1/2002 00:00:00′ AND ’12/31/2002 23:59:59′

    or alternatively,

    WHERE POH.OrderDate BETWEEN ’1/1/2002 00:00:00′ AND ’1/1/2003 00:00:00′

    Or is there something about the date comparison I’m missing?

Comments are closed.