Clarity in T-SQL: Adjoining Socks

One of my children is very literal, a true pedantic.

“Is there any homework you need to do?”

“No.”

“Really? Show me your homework sheet. … Why does it say ‘book report due’ for tomorrow?”

“Because the report is due tomorrow.”

[Deep breath.] “So why did you JUST tell me you don’t have any homework due.”

“Because I don’t have the book, and I can’t do the report. So I can’t do it.”

[More deep breaths, and a lengthy discussion following.]

As it turns out, you have to be very particular when talking to somebody as persnickity, as specific, as literal as that. Computers are even more persnickity, and so the questions you ask them have to be even more specific.

Querying? Be specific, Bob.

A T-SQL query is well named: a query is a question you’re asking the database. If you’ve been working with T-SQL for a while, you start to take queries for granted. You can write the code to find duplicates in your sleep. But sometimes, even “simple” queries take some thinking about. So, let’s think about them! Today, we’re thinking about joining tables.

In table 1, “Shoe”, we have a listing of all the shoes in your household. And table 2, “Sock”, holds all the socks.

Shoe: Brown shoe, Black shoe, Blue shoe

Sock: Brown sock, Black sock, White sock

We might need to ask the database to match shoes to socks based on size and color.

SELECT * FROM Shoe INNER JOIN Sock ON Shoe.Color = Sock.Color;

Simple and easy. We’re JOINing the two tables, and our criteria is Color=Color. No problem.  We might get back the Shoe/Sock pairs

Brown Shoe/Brown Sock

and

Black Shoe/Black Sock

Okay, now what if we want ALL the shoes, and the socks of matching color? Be specific there, Bob. We want all the shoes, whether or not they have a match, and matching socks where they exist. Pretty easy too, ish.

SELECT * FROM Shoe LEFT OUTER JOIN Sock ON Shoe.Color = Sock.Color;

That’s what left outer joins are for, right? It gets us everything from one table, regardless of the matching criteria. This query will get us the right answer, which is

Brown Shoe/Brown Sock

and

Black Shoe/Black Sock

and

Blue Shoe/NULL

Going Awry

Here’s where I’ve seen some people get tripped up. What we want now is all shoes that have NO matching color in the Sock table. The temptation is to write something like this:

SELECT * FROM Shoe INNER JOIN Sock ON Shoe.Color <> Sock.Color;

That seems to make sense. Sorta. On second thought, what we wrote here actually says “give me all mismatched pairs”. This query returns every iteration of mismatches: Brown/Black, Brown/White, Black/Brown, Black/White, Blue/Brown, Blue/Black.

Asking the right question

Okay, let’s rethink this. We’ve already seen we can get data from one table where there’s not a match, right? We used an OUTER JOIN for that.

SELECT * FROM Shoe LEFT OUTER JOIN Sock ON Shoe.Color = Sock.Color;

That gave us Brown/Brown, Black/Black, and Blue/NULL.  Soooo, if all we need to do now is get rid of those matches, then…well, SAY so!

SELECT * FROM Shoe LEFT OUTER JOIN Sock ON Shoe.Color = Sock.Color WHERE Sock.Color IS NULL;

This will return Blue/NULL.

Let’s take it one step further: We want to find shoes with no matches in Sock, and socks with no matches in Shoes. Well heck, that’d be a FULL outer join!

SELECT * FROM Shoe FULL OUTER JOIN Sock ON Shoe.Color = Sock.Color WHERE Sock.Color IS NULL OR Shoe.Color IS NULL;

Note we didn’t tell it WHERE Shoe is null AND Sock is NULL. That’s a great way to always get exactly zero rows returned. We need either Sock OR Shoe to return NULL, not both.

That’s about it, kids. And if any of you truly do keep an inventory of your socks and shoes in an RDBMS, I certainly hope it’s multithreaded.

Happy days,
Jen McCown
http://www.MidnightDBA.com/Jen