Inner, Outer, Where, What?

I heard a new term on a blog the other day (darn if I can remember who said it) that I liked: SQL Pro.  A SQL Pro may or may not be an actual DBA, or DB dev, or accidental DBA.  I've seen a lot more material directed at this nebulous group, and I like it.  In this spirit, let's talk about JOINs. I'm going to assume you've heard of them, and maybe have a vague idea of what they are. Basic syntax:

SELECT aTable.something ,

anotherTable.somethingElse

FROM aTable

JOIN anotherTable ON aTable.ID = anotherTable.ID

Short and sweet: A JOIN defines what two sets of data to stick together – in this case, table "aTable" and table "anotherTable" – and ON is the common link between them.  In our example above, each table has an ID, and we want to match rows together whose IDs match.  But you probably knew all that.

How about INNER vs OUTER joins?  An INNER join says, I want every row from each table that has a match on the other side. If it doesn't have a match, I don't wanna see it.  Let me say this now, because a lot of people get tripped up about it in interviews: THERE IS NO "LEFT INNER JOIN".  There is no "RIGHT INNER JOIN".  There is only INNER JOIN.  An example resultset from an INNER JOIN might look like this:

 

something somethingElse
bird feather
song note
dog fur

A LEFT OUTER JOIN says, I want every single row from the left table (reading the query left to right), and rows that match from the right table. Here's our example and resultset:

SELECT aTable.something ,

anotherTable.somethingElse

FROM aTable

LEFT OUTER JOIN anotherTable ON aTable.ID = anotherTable.ID

 

something somethingElse
bird feather
song note
computer NULL
scarf NULL
dog fur

See?  "Computer" and "scarf" didn't have a match in the somethingElse table, so they're matched with NULLs.  A RIGHT OUTER JOIN is the same, only we want all the rows from the right side, and whatever matches on the left.

There's lots more to know about JOINs, but today I want to highlight something that can trip you up. Let's say we're interested in money.   First, here's the script to make and populate our tables (just in case you wanna play along):

CREATE TABLE [Wallet] (WalletID int identity(1,1), WalletName VARCHAR(15))

CREATE TABLE [Money] (MoneyID int identity(1,1), WalletID INT, [Type] varchar(15), Amount int)

INSERT INTO Wallet VALUES ('My wallet')

INSERT INTO Wallet VALUES ('Your wallet')

INSERT INTO Wallet VALUES ('His wallet')

INSERT INTO [Money] VALUES (1, 'American', 10)

INSERT INTO [Money] VALUES (2, 'European', 1)

INSERT INTO [Money] VALUES (5, 'American', 10)

Let's find all the wallets in our database with money in them, where that money is American: 

SELECT [Wallet].* ,

[Money].*

FROM [Wallet]

INNER JOIN [Money] ON [Money].WalletID = [Wallet].WalletID

WHERE [Money].Type = 'American'

Nicely done. The inner join says, I want wallets, and I want money, but only if they're together. The WHERE clause says, oh yeah, and only give me those matching rows that have American money!  So far, so good.  But we suddenly become curious…just how many empty wallets are hanging around in the database, too?  Let's see ALL wallets, whether or not they have money.

SELECT [Wallet].* ,

[Money].*

FROM [Wallet]

LEFT OUTER JOIN [Money] ON [Money].WalletID = [Wallet].WalletID

Again, well done.  Wallet is our lefthand table, so we have a left join.  Whoa there, we forgot our "[Money].Type = 'American'" clause, and now we see empty wallets, wallets with American money, and wallets with other currency!  Right now we're not interested in other currency, so we have to fix our query.  I'm going to tell you in advance that this query is not going to give us what we want:

SELECT [Wallet].* ,

[Money].*

FROM [Wallet]

LEFT OUTER JOIN [Money] ON [Money].WalletID = [Wallet].WalletID

WHERE [Money].Type = 'American'

The above query does not show empty wallets (a wallet column with a NULL under the money columns)…it's only showing us wallets with American money.  So what went wrong?  Here's the scoop: the SQL engine processes a query in a certain order. First it gets all the relevant data, and then it narrows down that data using whatever's in the WHERE clause.  So we effectively told SQL, "First get us the LEFT OUTER JOIN data on Wallet and Money." And SQL said, "Okay, I have three rows here, all matched up. That last wallet is empty, dude!"  We said, "Great. Now look at that data you got, and narrow it down to JUST Money.Type = 'American'".  And SQL said, "Okee dokee.  There's just one row with Money.Type='American'.  Have a nice day!" 

We should have told SQL to narrow the data down by Type = 'American' OR Type IS NULL.

SELECT [Wallet].* ,

[Money].*

FROM [Wallet]

LEFT OUTER JOIN [Money] ON [Money].WalletID = [Wallet].WalletID

WHERE [money].TYPE = 'American'

OR [Money].TYPE IS NULL

Ahh, that's better.  And by the way, this is very much different than adding additional ON columns.  Let's play with that too, if you're up for it.  What does this do?

SELECT [Wallet].* ,

[Money].*

FROM [Wallet]

LEFT OUTER JOIN [Money] ON [Money].WalletID = [Wallet].WalletID

     AND [Money].Type = 'American'

Why, I'll tell you what it does! It returns all the left hand rows, and each row on the right that has the same WalletID AND Type = 'American'.  So you get NULLs under "Money" for the empty wallet, and the wallet with the Euros in it.  That's not necessarily a bad thing, if your goal isn't to find empty wallets, but just to display American currency and all wallets.

Get it? Sweet.

-Jen McCown, http://www.MidnightDBA.com