Inner, Outer, Where, What?

I heard a new term on a blog the other day (darn if I can remember who said it) that I liked: SQL Pro.  A SQL Pro may or may not be an actual DBA, or DB dev, or accidental DBA.  I've seen a lot more material directed at this nebulous group, and I like it.  In this spirit, let's talk about JOINs. I'm going to assume you've heard of them, and maybe have a vague idea of what they are. Basic syntax:

SELECT aTable.something ,

anotherTable.somethingElse

FROM aTable

JOIN anotherTable ON aTable.ID = anotherTable.ID

Short and sweet: A JOIN defines what two sets of data to stick together – in this case, table "aTable" and table "anotherTable" – and ON is the common link between them.  In our example above, each table has an ID, and we want to match rows together whose IDs match.  But you probably knew all that.

How about INNER vs OUTER joins?  An INNER join says, I want every row from each table that has a match on the other side. If it doesn't have a match, I don't wanna see it.  Let me say this now, because a lot of people get tripped up about it in interviews: THERE IS NO "LEFT INNER JOIN".  There is no "RIGHT INNER JOIN".  There is only INNER JOIN.  An example resultset from an INNER JOIN might look like this:

 

something somethingElse
bird feather
song note
dog fur

A LEFT OUTER JOIN says, I want every single row from the left table (reading the query left to right), and rows that match from the right table. Here's our example and resultset:

SELECT aTable.something ,

anotherTable.somethingElse

FROM aTable

LEFT OUTER JOIN anotherTable ON aTable.ID = anotherTable.ID

 

something somethingElse
bird feather
song note
computer NULL
scarf NULL
dog fur

See?  "Computer" and "scarf" didn't have a match in the somethingElse table, so they're matched with NULLs.  A RIGHT OUTER JOIN is the same, only we want all the rows from the right side, and whatever matches on the left.

There's lots more to know about JOINs, but today I want to highlight something that can trip you up. Let's say we're interested in money.   First, here's the script to make and populate our tables (just in case you wanna play along):

CREATE TABLE [Wallet] (WalletID int identity(1,1), WalletName VARCHAR(15))

CREATE TABLE [Money] (MoneyID int identity(1,1), WalletID INT, [Type] varchar(15), Amount int)

INSERT INTO Wallet VALUES ('My wallet')

INSERT INTO Wallet VALUES ('Your wallet')

INSERT INTO Wallet VALUES ('His wallet')

INSERT INTO [Money] VALUES (1, 'American', 10)

INSERT INTO [Money] VALUES (2, 'European', 1)

INSERT INTO [Money] VALUES (5, 'American', 10)

Let's find all the wallets in our database with money in them, where that money is American: 

SELECT [Wallet].* ,

[Money].*

FROM [Wallet]

INNER JOIN [Money] ON [Money].WalletID = [Wallet].WalletID

WHERE [Money].Type = 'American'

Nicely done. The inner join says, I want wallets, and I want money, but only if they're together. The WHERE clause says, oh yeah, and only give me those matching rows that have American money!  So far, so good.  But we suddenly become curious…just how many empty wallets are hanging around in the database, too?  Let's see ALL wallets, whether or not they have money.

SELECT [Wallet].* ,

[Money].*

FROM [Wallet]

LEFT OUTER JOIN [Money] ON [Money].WalletID = [Wallet].WalletID

Again, well done.  Wallet is our lefthand table, so we have a left join.  Whoa there, we forgot our "[Money].Type = 'American'" clause, and now we see empty wallets, wallets with American money, and wallets with other currency!  Right now we're not interested in other currency, so we have to fix our query.  I'm going to tell you in advance that this query is not going to give us what we want:

SELECT [Wallet].* ,

[Money].*

FROM [Wallet]

LEFT OUTER JOIN [Money] ON [Money].WalletID = [Wallet].WalletID

WHERE [Money].Type = 'American'

The above query does not show empty wallets (a wallet column with a NULL under the money columns)…it's only showing us wallets with American money.  So what went wrong?  Here's the scoop: the SQL engine processes a query in a certain order. First it gets all the relevant data, and then it narrows down that data using whatever's in the WHERE clause.  So we effectively told SQL, "First get us the LEFT OUTER JOIN data on Wallet and Money." And SQL said, "Okay, I have three rows here, all matched up. That last wallet is empty, dude!"  We said, "Great. Now look at that data you got, and narrow it down to JUST Money.Type = 'American'".  And SQL said, "Okee dokee.  There's just one row with Money.Type='American'.  Have a nice day!" 

We should have told SQL to narrow the data down by Type = 'American' OR Type IS NULL.

SELECT [Wallet].* ,

[Money].*

FROM [Wallet]

LEFT OUTER JOIN [Money] ON [Money].WalletID = [Wallet].WalletID

WHERE [money].TYPE = 'American'

OR [Money].TYPE IS NULL

Ahh, that's better.  And by the way, this is very much different than adding additional ON columns.  Let's play with that too, if you're up for it.  What does this do?

SELECT [Wallet].* ,

[Money].*

FROM [Wallet]

LEFT OUTER JOIN [Money] ON [Money].WalletID = [Wallet].WalletID

     AND [Money].Type = 'American'

Why, I'll tell you what it does! It returns all the left hand rows, and each row on the right that has the same WalletID AND Type = 'American'.  So you get NULLs under "Money" for the empty wallet, and the wallet with the Euros in it.  That's not necessarily a bad thing, if your goal isn't to find empty wallets, but just to display American currency and all wallets.

Get it? Sweet.

-Jen McCown, http://www.MidnightDBA.com

 

 

Philosophy and Query to Find Duplicate Records

I was inspired today to pass on my method for finding dupliate records.  I'm sure we could put together a SP to generate the duplicate records query for a given table, but that's not how I roll today…today, it's about the journey.

First understand that a duplicate record is whatever you say it is.  If your table has a unique column – for examlpe, an IDENTITY column as the surrogate primary key – you can't include that in your query, or you'll never find duplicate rows.  And there are certain other things that may not count in terms of duplicates…for example, a LastModifiedDate column. Maybe you don't care if two rows have different modified by dates, and it's only the data that counts.  It's your database, your business rules…only you know for sure what counts as a dupe.

So given that, here's the basic rundown for getting duplicate rows:

SELECT count(*) as NumberOfRows,

  firstName ,

  lastName ,

  phoneNumber ,

  customerAccount

FROM Customer

GROUP BY firstName ,

  lastName ,
  phoneNumber ,

  customerAccount

HAVING COUNT(*) > 1

Notice that we included a count(*) in the select list, just to see how many duplicates are in the table (it's not actually required, just nice to have).  Otherwise, the select list is the same as the GROUP BY list.  HAVING COUNT(*) > 1 lets us return only the rows that have duplicates.  Finally, I often include a MAX(CustomerID) and/or MIN(CustomerID) in the SELECT list, to get the first and last instances of the dulpicated rows.  That can come in handy when you want to alter or delete all but the most recent duplicate in each group.  But that's a blog for another day.  What they hay, let's see the query with the MIN and MAX:

SELECT count(*) as NumberOfRows,

  MIN(CustomerID),

  MAX(CustomerID),

  firstName ,

  lastName ,

  phoneNumber ,

  customerAccount

FROM Customer

GROUP BY firstName ,

  lastName ,
  phoneNumber ,

  customerAccount

HAVING COUNT(*) > 1

-Jen McCown, http://www.MidnightDBA.com

Code sins: Why so silent?

A job, like any thing in life, is very much an up and down, give and take.  Some weeks, I get to send emails like this (and I did):

Hey man, you wrote xyzProcedure back in march (so the proc says), and I just wanted to say thank you for using comments. It’s so much easier to develop on the back of others’ code if there’s a little explanation, and comments are something of a lost art.   

And then there are weeks where every procedure I open up looks like this:

CREATE PROCEDURE [dbo].[GetWidgetList] ( @StoreID INT = NULL )

AS

— ==============================================================================
— Author: Star Team
— Create date: 01/12/2009
— Description: Get Widget List
— ==============================================================================
SET NOCOUNT ON

IF @StoreID IS NOT NULL

BEGIN

–< insanely complicated code without comments here >

END

RETURN 0 ;

Oh, really?  Really??  I mean, nice use of commented seperation bars, and thank GOD I know when the frigging SP was written, but really?  It was written by the entire Star Team, not by an individual or group of individuals…in other words, anyone I could go to and ask for assistance, information, intelligent banter.  The description is the procedure name, with spaces (for readability!)  And I have 3 zillion lines of IF/THEN/ELSE/nested CURSOR code without a single useful comment, unless you count the aged out code that you've commented out instead of deleting.

Go aheady, kick my dog already.  Pour gasoline on my flowers.  Knock me down, steal my car, drink my liquor from the old fruit jar, but honey, put in some blue suede comments.  Here's a nice little example:

CREATE PROCEDURE [dbo].[GetWidgetList] ( @StoreID INT = NULL )

AS

— ==============================================================================
— Author: Jen McCown, Star Team.
— Email jen@jennifermccown.com
— Cel 123-456-7890
— Office 123-456-9874
— Create date: 01/12/2009
— Defect: CR6189
— Description: This procedure retrieves a list of active widgets for a
— particular store, along with configuration data from the Preet table.
— Note that if a widget has been aged out but is still active, it is
— displayed but marked as "moronic"; the Zeet users like this, but for the record
— it tends to confuse the BAs. Refer them to the WidgetBizRules document on
— Star Team sharepoint (url…).
— Return 0 – completed without errors.
— Return 1 – storeID does not exist
— …
— ==============================================================================
SET NOCOUNT ON

——————————-
— Check for existing StoreID
——————————-
IF @StoreID IS NOT NULL

BEGIN

——————————-

— section 1: declare & variables

——————————-

–< nice code with comments >

——————————-

— section 2: get config settings

— Notes blah blah blah…

——————————-

–…

END

RETURN 0 ;

Go thou and SIN NO MORE. 

-Jen McCown, http//www.MidnightDBA.com