Thinking All the Way Around a Query: Bonus Material

While I was filming the video to Thinking…Query, named Insert Non-Duplicate Rows, I got the question: instead of a correlated subquery, couldn't you just use a join with a negative condition? So I included that discussion in the video.

To review: the problem we're trying to resolve is inserting rows FROM Table1 INTO Table2, ignoring duplicates. The problem with using a JOIN is that it's looking for rows that satisfy the JOIN condition….that is, it wants to resolve to true. Trying to join on a negative condition goes against the nature of JOINS. Let's explore that idea.

Here's our data:

Table1
1, 1
2, 3
4, 5
…

Table2
1, 1
2, 3
8, 0

And here's our attempt at inserting nonduplicate rows with a JOIN:

select * from table2
INSERT INTO table2
SELECT t1.col1, t1.col2
FROM TABLE1 t1
JOIN table2 t2 on t2.col1 <> t1.col1
AND t2.col2 <> t1.col2

So let’s go through the logical step-by-step of how SQL Server will process this join. 1. Compare first row from Table1 (col1 = 1, col2 = 1) against the first row in Table2 (col1 = 1, col2 = 1).
Our join criteria is t2.col1 <> t1.col1 AND t2.col2 <> t1.col2 . When you plug these values into the join criteria, it reads “1 <> 1 AND 1 <> 1”. This is clearly FALSE, so SQL moves on to the next step.2. Compare the first row from Table1 (col1 = 1, col2 = 1) against the second row in Table2 (col1 = 2, col2 = 3). This reads, “2 <> 1 AND 3 <> 1”. Now, THIS is true, so the JOIN condition has been satisfied. Therefore, the query attempts to insert the Table1 row (1,1). But this violates our primary key constraint, and is the opposite of what we want to do.

That’s really all there is to say about that. Why can’t we use a join with not equals? Because it won’t work.

-Jen McCown, http://www.MidnightDBA.com