Home » Beginner »Intermediate »sql server »sqlserverpedia-syndication »SSC »TSQL » Currently Reading:

SELECT, Deconstructed

Today let’s expand on the logical processing order of SELECT that I mentioned in last week’s N Things Worth Knowing About SELECT blog.

We’re looking at the SELECT statement clauses in the order that the SQL engine logically processes them…we’ll even write it that way – it’ll look weird, but we’ll be reading it like the SQL engine does.

You can find more on SELECT’s logical processing order in BOL, and in Itzik Ben-Gan’s T-SQL Fundamentals and T-SQL Programming books.

Here’s our basic T-SQL query (using AdventureWorks):

SELECT P.Name ,
  P.ProductNumber ,
  P.Color ,
  P.StandardCost ,
  SC.Name [Category]
FROM Production.Product P
LEFT OUTER JOIN Production.ProductSubCategory SC ON P.ProductSubCategoryID = SC.ProductSubCategoryID
WHERE P.ProductSubCategoryID IS NOT NULL

If we get all pseudo-mathematical on this, we can say that there’s some theoretical set of rows that we’ll get back out of this query; each logical step – in order – further refines that set of rows until we get back the actual rowset we want.

FROM

The SQL engine starts with the FROM clause first, to see where the data’s coming from. In this case we’re pulling from the Product table…easy. So, the theoretical rowset right now is everything in the Product table. (You can think about each step passing on that theoretical rowset to the next step for further refinement.)

-- Logical order:

FROM Production.Product P

ON/JOIN

Next we look at the ON and JOIN clauses: JOIN tells us what data structure (remember, it doesn’t have to be a table) we’re hooking up with, and ON give us the criteria for hooking our result sets together. We’re still building our theoretical resultset here; a JOIN can either expand the resultset, or limit it.

-- Logical order:

FROM Production.Product P
ON P.ProductSubCategoryID = SC.ProductSubCategoryID
LEFT OUTER JOIN Production.ProductSubCategory SC

Notice that the ON clause is evaluated before the WHERE clause; so we have to understand that where we put a limiting clause (like Color=’black’) can make a huge difference in our resultset. In this case, including the search condition “color=’black’” in the ON clause makes no difference, because it’s a LEFT OUTER JOIN…it’s going to return all the rows from the left side – black, red, blue, etc. – regardless of the join condition. Go on, try it:

SELECT P.Name ,
  P.ProductNumber ,
  P.Color ,
  P.StandardCost ,
  SC.Name [Category]
FROM Production.Product P
LEFT OUTER JOIN Production.ProductSubCategory SC ON P.ProductSubCategoryID = SC.ProductSubCategoryID
AND P.color='black'
WHERE P.ProductSubCategoryID IS NOT NULL

Toldya. In the ON clause, we’re comparing resultsets to make a match in a specific way. In the WHERE clause, we’re limiting the resultset that we got. Now, if this were an inner join, we could include the color=’black’ in the ON clause; it would limit the resultset just as if we’d said it IN WHERE. But I’m still more likely to put that condition in the WHERE clause; it’s more “proper”, in the sense that if that query ever needs to change to an outer join, the condition belongs in WHERE.

WHERE

Speaking of where…this is clearly the place where we’d want to whittle our resultset down to just the rows that we really want.

-- Logical order:

FROM Production.Product P
ON P.ProductSubCategoryID = SC.ProductSubCategoryID
LEFT OUTER JOIN Production.ProductSubCategory SC
WHERE P.ProductSubCategoryID IS NOT NULL AND Color='black'

In this statement, it’s the last refinement we make to our resultset, so next the SQL engine would evaluate the SELECT clause to see what columns to pull, and our logically-ordered query would look like this:

-- Logical order:

FROM Production.Product P
ON P.ProductSubCategoryID = SC.ProductSubCategoryID
LEFT OUTER JOIN Production.ProductSubCategory SC
WHERE P.ProductSubCategoryID IS NOT NULL AND Color='black'
SELECT P.Name ,
  P.ProductNumber ,
  P.Color ,
  P.StandardCost ,
  SC.Name [Category]

But, there’s a lot more we can do in a SELECT statement, so let’s continue with a more complex query that includes GROUP BY, HAVING, and ORDER BY:

SELECT P.Color ,
  COUNT(*) ItemCount,
  MIN(P.StandardCost) MinCost ,
  MAX(P.StandardCost) MaxCost ,
  AVG(P.StandardCost) AvgCost ,
  SC.Name [Category]
FROM Production.Product P
LEFT OUTER JOIN Production.ProductSubCategory SC ON P.ProductSubCategoryID = SC.ProductSubCategoryID
WHERE P.ProductSubCategoryID IS NOT NULL
AND P.color='black'
AND AVG(P.StandardCost) > 100
GROUP BY SC.Name, P.color
HAVING AvgCost > 100

GROUP BY

After SQL gets the base rowset down (by evaluating FROM, JOIN/ON, and WHERE), it looks at the GROUP clause to see if we’re going to lump some rows together. Notice, by the way, that we STILL haven’t looked at the actual SELECT clause:

-- Logical order:

FROM Production.Product P
ON P.ProductSubCategoryID = SC.ProductSubCategoryID
LEFT OUTER JOIN Production.ProductSubCategory SC
WHERE P.ProductSubCategoryID IS NOT NULL AND Color='black'
GROUP BY SC.Name, P.color

Today we want some aggregate data about each category, so we’ll GROUP BY the category name and color – when grouping, anything that’s not an aggregate (SUM, MIN, MAX, COUNT, etc) in the select list, MUST to be in the GROUP BY clause.

HAVING

A quick bit of trivia: the HAVING clause doesn’t require a GROUP BY clause; in that case, it just behaves like a WHERE. Oh, and speaking of WHERE…why couldn’t we just put the “AND AVG(P.StandardCost) > 100″ in the WHERE clause? Isn’t it a limiting factor, just like Color and subcategoryID?

Well, yeah it is, but remember: SQL evaluated WHERE, then GROUP BY (the aggregation clause), and THEN the HAVING clause. The aggregate isn’t allowed in the WHERE clause, because at that point, SQL hadn’t heard anything about grouping data…all the rows were still autonomous. So, aggregates go in HAVING:

-- Logical order:

FROM Production.Product P
ON P.ProductSubCategoryID = SC.ProductSubCategoryID
LEFT OUTER JOIN Production.ProductSubCategory SC
WHERE P.ProductSubCategoryID IS NOT NULL AND Color='black'
GROUP BY SC.Name, P.color
HAVING AvgCost > 100

SELECT

After HAVING, SQL finally takes a look at the SELECT clause itself, and our theoretical resultset becomes an actual resultset.

-- Logical order:

FROM Production.Product P
ON P.ProductSubCategoryID = SC.ProductSubCategoryID
LEFT OUTER JOIN Production.ProductSubCategory SC
WHERE P.ProductSubCategoryID IS NOT NULL AND Color='black'
GROUP BY SC.Name, P.color
HAVING AvgCost > 100
SELECT P.Color ,
COUNT(*) ItemCount,
MIN(P.StandardCost) MinCost ,
MAX(P.StandardCost) MaxCost ,
AVG(P.StandardCost) AvgCost ,
SC.Name [Category]

DISTINCT, ORDER BY, TOP

From here, SQL will process DISTINCT (if we included it), then ORDER BY (if we included it), then TOP (ditto). Let’s go ahead and include an ORDER BY:

-- Logical order:

FROM Production.Product P
ON P.ProductSubCategoryID = SC.ProductSubCategoryID
LEFT OUTER JOIN Production.ProductSubCategory SC
WHERE P.ProductSubCategoryID IS NOT NULL AND Color='black'
GROUP BY SC.Name, P.color
HAVING AvgCost > 100
SELECT P.Color ,
  COUNT(*) ItemCount,
  MIN(P.StandardCost) MinCost ,
  MAX(P.StandardCost) MaxCost ,
  AVG(P.StandardCost) AvgCost ,
  SC.Name [Category]
ORDER BY AvgCost

Here’s a big mystery revealed: THIS is why the ORDER BY clause is the ONLY clause that can refer to column aliases: ORDER BY is the only full clause that’s evaluated after the aliases are! Put it this way: we read the query top down, while SQL saves the top for nearly last. When you shout an alias name at it in your WHERE, SQL has NO IDEA what on earth you’re talking about.

And….there you go. Happy days,

Jen

http://www.MidnightDBA.com/Jen

Currently there are "4 comments" on this Article:

  1. [...] This post was mentioned on Twitter by Bob Pusateri, Jen & Sean McCown. Jen & Sean McCown said: Blog: SELECT, Deconstructed http://tinyurl.com/2cu4cvr SELECT statement clauses in the order that the SQL engine logically processes them… [...]

  2. [...] N Things Worth Knowing About SELECT and SELECT, Deconstructed [...]

  3. Ben Smith says:

    Thanks for the post, Jen, very helpful.

    Did want to ask about one thing to make sure I’m not misunderstanding. About midway through the post you have the statement:

    *****
    SELECT P.Color ,
    COUNT(*) ItemCount,
    MIN(P.StandardCost) MinCost ,
    MAX(P.StandardCost) MaxCost ,
    AVG(P.StandardCost) AvgCost ,
    SC.Name [Category]
    FROM Production.Product P
    LEFT OUTER JOIN Production.ProductSubCategory SC ON P.ProductSubCategoryID = SC.ProductSubCategoryID
    WHERE P.ProductSubCategoryID IS NOT NULL
    AND P.color=’black’
    AND AVG(P.StandardCost) > 100
    GROUP BY SC.Name, P.color
    HAVING AvgCost > 100
    ***

    But later on you mention that aggregates do not belong in the WHERE clause, but rather the HAVING clause. In the statement above, should the AND AVG(P.StandardCost) > 100 be removed from the WHERE clause?

    Another related question… you state that ORDER BY is the only clause that can refer to column aliases. Why then does the HAVING clause in the SQL above refer to a column alias?

    Thanks for the clarification,
    Benjamin

Comment on this Article:







MidnightSQL Consulting

Need help? Got an emergency? Write us at Support@MidnightDBA.com!

We can schedule time to help with your backup/restore issues, high availability and disaster recovery setup, performance problems, and a great deal more. Very often, we're even available on the moment for downtime issues and emergencies.

For more information about MidnightSQL consulting, email us or check out www.MidnightSQL.com. Happy days!

Where are We?

September 13: SQL Saturday Kansas City
November 3-7: PASS Summit, Seattle, WADecember 11: Presenting "Powershell Cmdlets.." at Alaska SQL Server User Group

Blog Posts by Category

DBAs@Midnight

How to Eat Pop-tarts
Watch DBAs@Midnight live on Fridays,m 11pm Central time

The best database career advice you’ve never heard!

DBARoadmap.com

The DBA Roadmap Seminar is 7 MP3 tracks (over 5 hours!) of insider guidance on your database career. We'll teach you how and what to study as a DBA, weigh in on controversial resume debates, teach you to recognize a worthy recruiter, and discuss the new professionalism of interviews. Also some bonus materials, PDF companion guides, and really spiffy intro music!

Once your $99 PayPal payment is submitted, you'll get the download link in e-mail! (Download is a 370Mb ZIP file.)

Become a DBA. Become a BETTER DBA. Use the Roadmap.

Visit www.DBARoadmap.com for info, forums, and more!

Add to Cart View Cart

Cunningham’s Law

"The best way to get the right answer on the Internet is not to ask a question, it's to post the wrong answer."
Relevant: http://xkcd.com/386/