T-SQL Tuesday #016: Aggregates and Windowing Functions and Ranking! Yum!

T-SQL Tuesday #016Let’s get this out of the way right now: I am SO going to take advantage of this T-SQL Tuesday on aggregates to talk about OVER() and PARTITION BY. Aggregates and OVER go together like french fries and catsup…sure, you can have them apart, but together they’re just magical.

Note: If you’re not at all familiar with OVER (PARTITION BY) at all, take a quick look at my brief intro, Tip: Over and Partition By.

In my T-SQL Brush-up session (also known as the Forgotten SQL session), I walk my way through fairly random examples set in the AdventureWorks database, and miraculously end up somewhere sensible. 

(Tons more good talk and code after the jump…)

Select With Aggregates

Let’s start with a simple SELECT statement, chock full of aggregates:

/*****************************************************************************
------ Aggregates ------
This example gets the count, min/max/avg cost, and category of Products.
****************************************************************************/
SELECT
    COUNT(*) ItemCount ,
    MIN(P.ListPrice) MinCost ,
    MAX(P.ListPrice) MaxCost ,
    AVG(P.ListPrice) AvgCost ,
    SC.Name [Category]
FROM Production.Product P
LEFT OUTER JOIN Production.ProductSubCategory SC
    ON P.ProductSubCategoryID = SC.ProductSubCategoryID
WHERE P.ProductSubCategoryID IS NOT NULL
    AND P.Color IS NOT NULL
GROUP BY SC.Name
ORDER BY SC.Name;

That looks easy enough! We’re just getting some item COUNT, MIN/MAX/AVG cost action from our products table. 

Rows and Aggregations, Side by Side

 But we need a query for a report on individual items, not a summary. We want all the individual data alongside the aggregate data for each group. Enter the OVER (PARTITION BY) clause!

/*******************************************************************************
** OVER allows you to add aggregate columns within a query, without a GROUP BY. **
*******************************************************************************/
SELECT
    P.ListPrice ,
    COUNT(*) OVER ( PARTITION BY SC.Name ) [Item Count Per Category] , -- How many items IN EACH GROUP (SC.Name is category)
    MIN(P.ListPrice) OVER ( PARTITION BY SC.Name ) [Min Cost Per Category] ,    -- Minimum price IN EACH GROUP
    MAX(P.ListPrice) OVER ( PARTITION BY SC.Name) [Max Cost Per Category],    -- Maximum price IN EACH GROUP
    AVG(P.ListPrice) OVER ( ) [Average Cost Overall],    -- Average price over the whole rowset
    Count(P.ListPrice) OVER ( ) [Count Overall] ,  -- Count of items over the whole rowset
    SC.Name [Category] ,
    P.ProductNumber
FROM Production.Product P
LEFT OUTER JOIN Production.ProductSubCategory SC ON P.ProductSubCategoryID = SC.ProductSubCategoryID
WHERE P.ProductSubCategoryID IS NOT NULL
    AND P.Color IS NOT NULL;

A partial resultset looks like this:

ListPrice Item Count Per Category Min Cost Per Category Max Cost Per Category Average Cost Overall Count Overall Category ProductNumber
89.99 3 89.99 89.99 885.3284 245 Bib-Shorts SB-M891-S
89.99 3 89.99 89.99 885.3284 245 Bib-Shorts SB-M891-M
89.99 3 89.99 89.99 885.3284 245 Bib-Shorts SB-M891-L
106.5 2 106.5 106.5 885.3284 245 Brakes RB-9231
106.5 2 106.5 106.5 885.3284 245 Brakes FB-9873

Notice that this query has no GROUP BY, because we want to see all the individual prodcts. This is, of course, very cool and convenient.

Ranking with Row_Number

Now, let’s melt the cheddar cheesy goodness of Row_Number over the top of this query…we’ll rank each product based on price, within its subcategory.  So for example,  all the gloves will be ranked from cheapest to most expensive, 1 to however many gloves we sell:

/*****************************************************************************
------ Ranking ------
******************************************************************************/
SELECT
     ROW_NUMBER() OVER ( PARTITION BY SC.Name ORDER BY P.ListPrice ) RankNum ,
    COUNT(*) OVER ( PARTITION BY SC.Name ) [Count],
    P.ProductNumber,
    P.ListPrice ,
    SC.Name [Category]
FROM Production.Product P
JOIN Production.ProductSubCategory SC ON P.ProductSubCategoryID = SC.ProductSubCategoryID
WHERE P.ProductSubCategoryID IS NOT NULL
    AND P.Color IS NOT NULL;

This gives us a rowset like this (partial):

RankNum Count ProductNumber ListPrice Category
1 3 CS-4759 175.49 Cranksets
2 3 CS-6583 256.49 Cranksets
3 3 CS-9183 404.99 Cranksets
1 2 FD-2342 91.49 Derailleurs
2 2 RD-2308 121.46 Derailleurs
1 6 GL-H102-S 24.49 Gloves
2 6 GL-H102-M 24.49 Gloves

 Let’s take a quick step back. What we have here is a list of all products, grouped by category and ranked from cheapest to most expensive (within category). What could we do with that kind of information? Well, how about we put together a package deal for the salespeople to use….say, a “Frugal Shopper Special” of the least expensive item from each category we sell!

CTE FTW

There’s no way to ferret out this information with a basic SELECT-FROM-WHERE query. You can’t say “WHERE Row_Number() = 1” (because windowed functions can only appear in the SELECT or ORDER BY), or “WHERE RankNum = 1” (because column level aliases can only appear in the ORDER BY clause).  We have to make the SQL engine pretend that this rowset is a persisted table.  While, yes, you could load this data into a temporary table or table variable, we know (in our particular circumstance) that we won’t need to access that data again. Why not use a CTE?

Note: For an intro to CTEs, see N Things Worth Knowing About CTEs.

We stick the entire query from above into a CTE, and just select all columns WHERE RankNum = 1 (that is, for the cheapest item in each category).  [Edit: This query now works; I’d had too many column names in the CTE list.]

WITH RankedProducts ( RankNum, ProductNumber, ListPrice, Category, StandardCost )
AS ( SELECT ROW_NUMBER() OVER ( PARTITION BY SC.Name ORDER BY P.ListPrice ) RankNum ,
    P.ProductNumber ,
    P.ListPrice ,
    SC.Name [Category],
    StandardCost
FROM Production.Product P
LEFT OUTER JOIN Production.ProductSubCategory SC ON P.ProductSubCategoryID = SC.ProductSubCategoryID
WHERE P.ProductSubCategoryID IS NOT NULL
    AND P.Color IS NOT NULL
)
SELECT ProductNumber ,
    ListPrice ,
    Category ,
    SUM(StandardCost) OVER () TotalStandardCost,
    SUM(ListPrice) OVER ( ) TotalCost
FROM RankedProducts
WHERE RankNum = 1
ORDER BY Category;

And you’ll notice there at the end, we added the sum of standard cost and of list price for the entire set…those numbers mean “sum of what it costs to make these items” and “sum of what we normally sell these items for”. Full result set:

ProductNumber ListPrice Category TotalStandardCost TotalCost
SB-M891-S 89.99 Bib-Shorts 2120.3968 3833.625
RB-9231 106.5 Brakes 2120.3968 3833.625
CA-1098 8.99 Caps 2120.3968 3833.625
CH-0234 20.24 Chains 2120.3968 3833.625
CS-4759 175.49 Cranksets 2120.3968 3833.625
FD-2342 91.49 Derailleurs 2120.3968 3833.625
GL-H102-S 24.49 Gloves 2120.3968 3833.625
HL-U509-R 34.99 Helmets 2120.3968 3833.625
HY-1023-70 54.99 Hydration Packs 2120.3968 3833.625
LJ-0192-S 49.99 Jerseys 2120.3968 3833.625
BK-M18B-40 539.99 Mountain Bikes 2120.3968 3833.625
FR-M21B-40 249.79 Mountain Frames 2120.3968 3833.625
PA-T100 125 Panniers 2120.3968 3833.625
PD-M282 40.49 Pedals 2120.3968 3833.625
BK-R19B-44 539.99 Road Bikes 2120.3968 3833.625
FR-R38B-58 337.22 Road Frames 2120.3968 3833.625
SH-M897-M 59.99 Shorts 2120.3968 3833.625
SO-R809-M 8.99 Socks 2120.3968 3833.625
TG-W091-S 74.99 Tights 2120.3968 3833.625
BK-T18U-54 742.35 Touring Bikes 2120.3968 3833.625
FR-T67U-50 333.42 Touring Frames 2120.3968 3833.625
VE-C304-S 63.5 Vests 2120.3968 3833.625
FW-M423 60.745 Wheels 2120.3968 3833.625

Now the sales folk have which items to put in the package, and the current profit margin (so they can play around with discounts).  It is, in short, a delicious heap of steaming potato-, catsup-, and cheesy-goodness.  Okay, now I want fries… 

For more on these particular examples, see the recording of T-SQL Brush-up (requires PASS login), given at 24 Hours of PASS Fall 2010 (and download the demo code here).

Happy Tuesday!
Jen McCown
http://www.MidnightDBA.com/Jen

4 thoughts on “T-SQL Tuesday #016: Aggregates and Windowing Functions and Ranking! Yum!

  1. Pingback: SQL Awesomesauce » Blog Archive » Learn T-SQL in 5 Minutes

  2. Pingback: Less Than Dot - Blog - Awesome

  3. Sherri

    Awesome content. You should use social websites to increase traffic.
    There are tools which automate this time consuming
    process.Visitors can flood your blog in no time, just search in google for:
    Rixisosa’s Social Automation

Comments are closed.