N Things Worth Knowing About SELECT

Whoa.
SELECT is this kind of Swiss Army Knife

SELECT is our bedrock, our foundation, our now-and-forever T-SQL multitasker…and it’s one of the most complicated structures available to us.  Here are a few things worth knowing about it, from the basic to the obscure. 


More-or-Less Basic Structure

There’s tons more that can go into the SELECT statement, of course, but the basic structure is: 

WITH <CTE>
SELECT <columns>
FROM <table
INNER | OUTER | FULL JOIN <table> ON <criteria>
WHERE <criteria>
GROUP BY <column(s)>
HAVING <criteria>
ORDER BY <columns>
 

Interesting side note: Did you know that the HAVING clause does not require a GROUP BY clause? Of course then, the HAVING just behaves like a WHERE. Still, that could help you out on Quiz Bowl night at PASS.  Edit: HAVING without GROUP BY is sort of a quirk; to see an example of this, see Rob Farley’s bloT-SQL Tuesday – T-SQL Puzzle with HAVING.

And a more useful side note: In 2005 and later versions, you can ORDER BY columns that aren’t in your SELECT list. So I can SELECT FirstName, MiddleName FROM Table1 ORDER BY LastName, if I so choose. 

Data Sources

You can SELECT data from a constant (such as “SELECT 100”), a variable, a table, temporary table, table variable, view, or table-valued function. 

CTEs

If you haven’t messed with them, CTEs (Common Table Expressions) are actually pretty #awesomesauce. A CTE is, in essence, a temporary view attached to your select statement. Here’s very simple example: 

WITH JensTable (ID, Column1)
AS
(SELECT TOP 10 ID, Column1 FROM Jen.MyTable)
SELECT ID, Column1
FROM JensTable
WHERE ID < 500
 

“With ___” gives the CTE a name; the first parentheses define the columns that we can access; and the parenthetical SELECT (line 3) pulls data for our CTE.  One very important note: this is one of the very few T-SQL statements that has a semicolon requirement: a statement preceding the CTE must end with a semicolon. For example: 

SELECT 'We are starting!';
WITH JensTable (ID, Column1)
AS
(SELECT TOP 10 ID, Column1 FROM Jen.MyTable)
SELECT ID, Column1
FROM JensTable
WHERE ID < 500
 

“SELECT ‘We are starting!'” MUST end in a semicolon, or this batch will throw an error. 

Widgets You Can Throw Into SELECT

On top of the regular structure (of which nearly everything except the word “SELECT” is optional), you can also throw in 

  • TOP keyword, to get the top N or top X% rows. Use TOP <whatever> WITH TIES to get extra rows, if they have the same value as rows in the TOP list. 

Interesting side note: You can use TOP with INSERT and UPDATE statements (INSERT TOP (2) INTO Table1 (col1) SELECT col1 FROM MyTable ORDER BY col1), but it’s not constrained by the ORDER BY in the select statement…you’re essentially inserting two random rows. The workaround?  A TOP clause in the select statement (INSERT TOP (2) INTO Table1 (col1) SELECT TOP 2 col1 FROM MyTable ORDER BY col1). End of side note.  

  • DISTINCT, to get unique rows (NULLs are treated as equals, just this once)
  • INTO, to pipe your data into a very useful table or table variable.
  • FOR XML, if you’re into that kind of thing.
  • UNION, to stack one resultset on top of another  (column lists must be identically named/typed)
  • EXCEPT or INTERSECT, which is very like UNION (cols must be same in name and type), but instead of adding, you’re looking for rows that match (intersect) or that are missing from one side (except). This is rather like a shorthand JOIN, where you don’t have to write out all the criteria (A.col1 = B.Col1 AND A.col2 = B.Col2 etc etc).
  • Subqueries. It’s amazing the places you can stick subqueries, truly…BOL says “a subquery can be used anywhere an expression is allowed”.  That’s a whoooole nother article, though (or several!).

How SQL Server processes the SELECT statement

The BOL article on SELECT gives us the processing order of the select statement: 

  1. FROM
  2. ON
  3. JOIN
  4. WHERE
  5. GROUP BY
  6. WITH CUBE or WITH ROLLUP
  7. HAVING
  8. SELECT
  9. DISTINCT
  10. ORDER BY
  11. TOP

In other words, when SQL Server first looks at your select statement, it first wants to know where the data is coming from (FROM and ON). Then, if there are joins, it wants to know how to join them together…this limits or expands the resultset.  And so on. 

About a year ago, I reviewed Itzik Ben-Gan’s book “Inside MS SQL Server 2008: T-SQL Querying” (I called it “an absolute gold mine for T-SQL developers, both as a study guide and as a mark-it, underline-it, dogear-it reference”, and I stand by that).  Chapter 1 of that book gives the best discussion I’ve ever seen on logical query processing, and I urge you to run RIGHT out to Amazon and buy it, or at least borrow it from someone in your user group.


I think that’s a nice start on our SELECT statement trivia. What about you…what are your most/least favorite SELECT statement factlets?

Happy days,

Jen McCown

http://www.MidnightDBA.com/Jen