Home » Beginner »Intermediate »sql server »sqlserverpedia-syndication »SSC »TSQL » Currently Reading:

Compare tables to find missing rows

Get it? Missing rows?

Get it?

Let’s talk about the case where you want to compare tables to see what’s missing. We might be comparing a list of orders to an imported set of data in a staging table. Or, we’re identifying customers in the database that aren’t on the call list. Whatever the application, finding out the difference between one table and another is a very common need.

First, a bit of setup

We’ll set up a couple of temporary tables for our example, and load them up with a few random schema and table names. That’s a nice generic set of data. If we want to imagine a scenario for this data, this is something we might use in a custom maintenance routine (e.g., “which tables in the database weren’t reindexed last week”?).

CREATE TABLE #AllTableList
    (
      SchemaName SYSNAME ,
      TableName SYSNAME
    );

CREATE TABLE #MyTableList
    (
      SchemaName SYSNAME ,
      TableName SYSNAME
    );

INSERT  INTO #AllTableList
        ( SchemaName ,
          TableName
        )
        SELECT TOP 10
                s.name AS SchemaName ,
                o.name AS TableName
        FROM    sys.objects o
                JOIN sys.schemas s ON o.schema_id = s.schema_id
        WHERE   o.type = 'u'; 

INSERT  INTO #MyTableList
        ( SchemaName ,
          TableName
        )
        SELECT TOP 5
                s.name AS SchemaName ,
                o.name AS TableName
        FROM    sys.objects o
                JOIN sys.schemas s ON o.schema_id = s.schema_id
        WHERE   o.type = 'u';

There we go, one table with “all” the data, and one table with rows “missing”. So how do we tell the difference?

This query is wrong

If we reason it out, it makes sense that you’d have to join the tables together to see where the table and schema names within are NOT equal. Let’s see if that works:

SELECT DISTINCT
        al.SchemaName ,
        al.TableName 
FROM    #AllTableList AS al
        INNER JOIN #MyTableList AS my ON my.TableName <> al.TableName;

Run this, and you get back 45 rows. What went wrong?

What happened is exactly what we asked for: SQL joined every schema.tablename from “All” with every one from “My” that didn’t match. Go ahead and add the “My” columns to the select list, and you’ll see what I’m talking about:

SELECT DISTINCT
        al.SchemaName ,
        al.TableName ,
        my.SchemaName ,
        my.TableName
FROM    #AllTableList AS al
        INNER JOIN #MyTableList AS my ON my.TableName <> al.TableName;

Now in our resultset, we see that dbo.dep1 is paired with dbo.dep2, and so on. We asked for mismatches, we got that. We didn’t get any info about missing data!

img2

One right way: OUTER JOIN

We want to look at all the data from the “All” table, whether or not it matches, and then discard the matches, because we only want tables that don’t have a match. Let’s do this in two steps.

First, to get back all the data from “All”, we need a LEFT OUTER JOIN.  And don’t forget we said “whether or not it matches”…if we’re talking about matches, then we need to get matches! That means we need to use a match criteria: equals (=) instead of not equals (<>).   Oh, and we’ll want to add SchemaName back into the mix….it takes a table name and a schema to qualify as a match, in this case.

SELECT DISTINCT
        al.SchemaName ,
        al.TableName ,
        my.SchemaName ,
        my.TableName
FROM    #AllTableList AS al
        LEFT OUTER JOIN #MyTableList AS my ON my.TableName = al.TableName
                                              AND my.SchemaName = al.SchemaName; -- not quite right yet

Now, that’s more like it! We can clearly see where the table names and schemas match, and where we’re missing data!

img3

So all that’s missing is a WHERE my.TableName IS NULL clause, and we’re golden:

SELECT DISTINCT
        al.SchemaName ,
        al.TableName ,
        my.SchemaName ,
        my.TableName
FROM    #AllTableList AS al
        LEFT OUTER JOIN #MyTableList AS my ON my.TableName = al.TableName
                                              AND my.SchemaName = al.SchemaName
        WHERE my.TableName IS NULL; -- Right! Shows missing rows from #MyTableList.

img4

We’ve found the missing data! Of course, there’s more than one way to skin a query.

Another right way: WHERE NOT EXISTS

Very similar to the last example, we can compare the two result sets based on a matching criteria, and only return the rows that don’t exist in #MyTableList. The following query does exactly that, with a correlated subquery:

SELECT  al.SchemaName ,
        al.TableName
FROM    #AllTableList AS al
WHERE   NOT EXISTS ( SELECT *
                     FROM   #MyTableList AS my
                     WHERE  my.TableName = al.TableName
                            AND my.SchemaName = al.SchemaName );

In pseudocode: Gimme All, where a match doesn’t exist (NOT EXISTS) in #MyTableList (as compared by TableName and SchemaName). Pretty cool, right?

Another right way: EXCEPT

The EXCEPT operator compares two result sets, column to column, and returns everything from the first set that isn’t in the second. So we can do exactly what we just did, like this:

SELECT  SchemaName ,
        TableName
FROM    #AllTableList
EXCEPT
SELECT  SchemaName ,
        TableName
FROM    #MyTableList;

Read in pseudocode, this says “give me schema and table from #AllTableList, unless you see that schema/table pair in #MyTableList”. Nice, eh?

These aren’t the only ways to get this information, of course. They’re just some of the most common.

Happy days,
Jen McCown
www.MidnightDBA.com/Jen

For a little more on this topic – most especially why you might not want to use “NOT IN”, take a quick look at #4 on “The Ten Most Asked SQL Server Questions And Their Answers” on Less Than Dot.

Currently there are "3 comments" on this Article:

  1. [...] Compare tables to find missing rows – Sharing a T-SQL walkthrough this week, it’s Jen McCown (Blog|Twitter). [...]

  2. Thomas says:

    Good Article Jen.

    Thomas

  3. [...] McCown’s Compare tables to find missing rows presents several solutions, using T-SQL, to compare two tables, to determine which records are [...]

Comment on this Article:







Minion Reindex by MidnightDBA is here!

 

Excellent Index Maintenance

Download Minion Reindex, log feature requests, read documentation, and sign up for the newsletter at MidnightSQL.com/Minion!




Monday, Oct 27 12:00PM CDT: Attend the Minion Reindex Intro Webinar.

MidnightSQL Consulting

Need help? Got an emergency? Write us at Support@MidnightDBA.com!

We can schedule time to help with your backup/restore issues, high availability and disaster recovery setup, performance problems, and a great deal more. Very often, we're even available on the moment for downtime issues and emergencies.

For more information about MidnightSQL consulting, email us or check out www.MidnightSQL.com. Happy days!

Where are We?

November 3-7: PASS Summit, Seattle, WA

PASS Summit: Jen is presenting How to Interview a DBA: A Panel Debate on Thursday 11/6 1:30pm, room 401 (along with Adam Machanic, Sean McCown, Bob Pusateri, and Michelle Ufford).

PASS Summit: Sean is presenting Performance Tuning Your Backups on Wednesday 11/5 3:00pm, room 602-604.

December 11: Presenting "Powershell Cmdlets.." at Alaska SQL Server User Group

January 30: "Become an Enterprise DBA" precon at Austin SQL Saturday

Blog Posts by Category

DBAs@Midnight

How to Eat Pop-tarts
Watch DBAs@Midnight live on Fridays,m 11pm Central time

The best database career advice you’ve never heard!

DBARoadmap.com

The DBA Roadmap Seminar is 7 MP3 tracks (over 5 hours!) of insider guidance on your database career. We'll teach you how and what to study as a DBA, weigh in on controversial resume debates, teach you to recognize a worthy recruiter, and discuss the new professionalism of interviews. Also some bonus materials, PDF companion guides, and really spiffy intro music!

Once your $99 PayPal payment is submitted, you'll get the download link in e-mail! (Download is a 370Mb ZIP file.)

Become a DBA. Become a BETTER DBA. Use the Roadmap.

Visit www.DBARoadmap.com for info, forums, and more!

Add to Cart View Cart

Cunningham’s Law

"The best way to get the right answer on the Internet is not to ask a question, it's to post the wrong answer."
Relevant: http://xkcd.com/386/