Category Archives: Applied SQL

Tricky TSQL: NOT IN (NULL)

We’ve done this before, but we can go one better this time.

Let’s take this step by step.

NULL means “I don’t know”. It stand for an unknown value.

Nothing can be equal to NULL. We simply can’t say that 1 = NULL, or ‘ABBA’ = NULL,  because we don’t know what value NULL might possibly be in the real world.

Nothing can be NOT equal to NULL. We can’t say that 1 <> NULL, or ‘ABBA’ <> NULL. NULL may well turn out to be 1, or ABBA, or whatever value we’re trying to check. We simply don’t know.

A simple example

So let’s take a very simple example, to demonstrate how this works. This particular query will return the value ‘True’, because 1 is demonstrably equal to 1:

IF 1 = 1
   SELECT   'True!';

Yep, that makes sense. I can tell: 1 = 1.

This query will not return ‘True’; it will return nothing, because 1 cannot be proven to be equal to NULL:

IF 1 = NULL
   SELECT   'True!';

We have an unknown, and nothing can possibly be equal to the unknown.

Another way to put this, is to use the keyword IN:

IF 1 IN ( SELECT    NULL AS a )
   SELECT   'True';

That parenthetical SELECT NULL simply returns NULL, so the query ends up asking: “is the value 1 in the set of values (NULL)?” We have no idea, so the query does not return ‘True’.

Almost the same thing, but trickier

Here’s yet another query that will not return ‘True’; 1 cannot be proven to be NOT equal to NULL:

IF 1 <> NULL
   SELECT   'True!';

NULL is unknown, and nothing can possibly be NOT equal to the unknown.

NULL will mess up your  NOT IN queries

And the absolute trickiest bit – the one that will trip you up, if you’re not thinking? NOT IN.

This query also returns nothing!

IF 1 NOT IN ( SELECT    2 AS a
              UNION
              SELECT    NULL AS a )
   SELECT   'True';

The IF statement asks: “is 1 in the collection of values (2, NULL)?” And we must say, “ah, no idea. That can’t be proven. Therefore we can’t return true.”

The value 1 cannot be determined to not be in the set of (2, NULL).

Play around with these, let it sink in. And watch out for NULLs in your NOT IN result sets!

One Fix

So, always use IS NULL or IS NOT NULL for your NULL comparison needs. And if you have a [value] NOT IN situation, read on.

Here’s what I did* to fix this exact situation in a recent query:

 SELECT ID
 FROM   Table1
 WHERE  ID NOT IN ( SELECT  ID
                    FROM    Table2
                    WHERE   ID > 0 );

While we could have used ID IS NOT NULL instead of ID > 0, it would make any relevant index on Table2 useless to us. In this particular case, ID is restricted to only positive above 0, and NULL.

To read up on even more solutions, see Mladen Pradjic’s article JOIN vs IN EXISTS.

Happy days,
Jen
www.MidnightDBA.com/Jen

 

 

Slash star, dash dash: Be obsessive about comments

I’m working on a new session* that I’ve named “T-SQL’s Hidden Support Feature”. It’s about comments. I’ll get to the reasons why I’m writing and presenting an hour long session on /* */ and — a bit later, but first: Why am I calling comments a “hidden support feature”?

Comments themselves aren’t hidden; it’s the idea of comments as a support feature that’s stuck back behind the curtains somewhere. You see, I can’t find a single blog anywhere about comments. (Well, I found one, but it was comparing comment features between RDBMSs.)

Comments support code

Comments are a form of documentation. One of the best forms, if you do it right. And documentation supports code…rather, makes it more supportable, by making it more understandable to those who need to deal with it.

In what ways do comments beat traditional documentation?

  • Comments actually get written. Few IT professionals like to stop coding to go write about their code. But most of us can be convinced to jot down applicable notes AS we code.
  • They’re handy-dandy, readily available. Comments are not a separate document. So they’re always with the relevent code, instead of being hidden away in some aged-out, unused server that no one knows about. The people that NEED them will SEE them, when they open the SP (or function, or view, etc.)
  • They’re easy to update. NOBODY likes to update documents. Fix code, open doc, change doc, save doc. Nope nope nope. But, if the “documentation” is right there IN the code, I can modify it as I modify the code.

Plant some comments in a flower box

Let’s take the Minion.BackupMaster SP as an example. Download MB and follow along at home, if you like, or just press on without detailed visual aids.

MB1

Right there at the top, we’ve got some information about one of the parameters. With a little bit of looking, most of the other parameters are (rather, might be?) largely self explanatory. But @Include has some interesting options that wouldn’t be noticed without digging into the code.

After that we have what’s called the “flower box”: the big slash-star comment at the top of an object that has a lot of standard information**, like:

  • who wrote the object (and any copyright or source URL)
  • when it was written
  • what it’s for
  • modification log

That’s already a lot of good information. I can’t tell you how many stored procedures I’ve opened up, only to wish I knew who owned it, how old it was, when it was last modified, or what in god’s name anyone thought it might be useful for.

We like to add more, even more useful, information to this flowerbox:

  • other information and resources about our organization(s)
  • what the parameters are for. (Remember how I said that most params are self explanatory? They aren’t. Document them.)
  • what temporary tables are created, and how they’re used
  • example execution statements (for a stored procedure or function)
  • table of contents (SP / function)

All of this stuff is useful to me, as the author – because coders write a lot of code, and we’re going to forget the nuances in about a week’s time. And it’s useful to me as the support staff, because I might have no earthly idea what @ReadOnly is for. It’s a LOT easier and faster – not to mention more accurate – to get a short explanation in the flower box, than it is to tear this entire 2,500 stored procedure (and any dependent objects) apart.

END;

I don’t want to give away the whole session. In fact, I don’t even want to give away the whole first 10 minute section of the session. You’ll have to wait until I give it at SQL Saturday Oklahoma City (or, until the recording is posted afterward).

But start with the flower boxes. Get into that habit; make your code more supportable.

-Jen
http://www.MidnightDBA.com/Jen

*The session started in my head years ago, and launched a few weeks ago when I was double-dog-dared to write it by Oklahoma City user group members. As some wise cartoon character said, friends can get one into such trouble.

** @EdDebug pointed out that the source and modification log I mentioned should really be taken care of by the use of source control. I agree. I also know that many of the organizations that I’ve been in do not use source control for their SQL code. With any luck, the trend is changing,

“Thing IS NOT NULL” isn’t the same as “Thing != NULL”

Nonsensical questions: What color do you want that DB?
Nonsensical questions: What color do you want that DB?

I’ve talked about NULLs before. (Ref: Here’s a bit on ISNULL(), and here’s one about understanding NULL at a basic level.)  And yet, it’s worthwhile to bring these things up from time to time.

“[Thing] != NULL” is nonsense. It won’t work the way you want it to work, because NULL literally means “unknown”. Nothing in the world can be equal to, or not equal to, an unknown.

The TechNet article on IS [NOT] NULL says:

To determine whether an expression is NULL, use IS NULL or IS NOT NULL instead of comparison operators (such as = or !=). Comparison operators return UNKNOWN when either or both arguments are NULL.

Example time – what does != NULL do?

Let’s walk through an example.

This will create a temporary table and add three rows, one of which has a NULL value in the column “mychar”:

CREATE TABLE #tmp (id INT IDENTITY(1,1), mychar CHAR(1) NULL);
INSERT INTO #tmp (mychar) VALUES ('a'), ('b'), (NULL);
SELECT id, mychar FROM #tmp;

So, the data in our table is:

id mychar
1 a
2 b
3 NULL

Now it’s very easy to demonstrate how the IS NULL/IS NOT NULL predicates will compare against the =/!= comparison operators.

IS NULL vs =

First, let’s take a look at IS NULL versus = NULL.

select id, mychar FROM #tmp WHERE mychar IS NULL;
select id, mychar FROM #tmp WHERE mychar = NULL;

The IS NULL query returns one row, as it should:

id mychar
3 NULL

But the “= NULL” query returns no rows. Why? Because SQL has NO IDEA what you’re talking about. “Does mychar equal an unknown? Um. I don’t know? Whatever, here’s your empty result set.

IS NOT NULL vs !=

And now, IS NOT NULL versus != NULL.

select id, mychar FROM #tmp WHERE mychar IS NOT NULL;
select id, mychar FROM #tmp WHERE mychar != NULL;

The IS NULL query returns two rows, as it should:

id mychar
1 a
2 b

But again, the query with the comparison operator “!= NULL” returns no rows.  “Does mychar not equal an unknown?” is, as far as SQL Server is concerned, not a valid question. It’s exactly like asking a coworker, “What sized year did you eat for lunch?” or “What color do you want that database?

A little discussion…

As to whether it’s faster, as someone asked, to use = NULL and != NULL…I haven’t run the benchmarks, as it’s nonsense. But I’d assume these would be faster than IS NULL/IS NOT NULL, since SQL doesn’t bother trying to work out “is this equal to unknown”. You’re just not going to get any rows back.

If you’d like to speed up performance in this way, it’d be much simpler and clearer to use WHERE 1=2; or to not write the query at all.

Happy Days,
Jen
http://www.MidnightDBA.com/Jen

Have you checked out Minion Reindex yet? It’s our free, easy, comprehensive index maintenance solution. You’ll love it, seriously.