Using NOT IN operator with null values

The IN operator compares a value with a list of values. It must however be used with
care when we are dealing with nulls.

Let’s create a table containing three city names and a null value. The goal is check
whether a city is in the list or not.

-- By default ANSI_NULLS is off so null comparisons follows the SQL-92 standard.
-- In future version of SQL Server, it won't be possible to modify this setting.
SET ANSI_NULLS OFF

IF EXISTS(select 1 from INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME='CITIES')
DROP TABLE [CITIES]

CREATE TABLE [CITIES] (CITY [varchar](50) NULL)

INSERT INTO CITIES
SELECT 'PARIS' UNION ALL
SELECT 'MONTREAL' UNION ALL
SELECT 'NEW YORK' UNION ALL
SELECT NULL

The table now contains the following city names:

PARIS
MONTREAL
NEW YORK
NULL

Let’s use the IN operator to determine if Montreal is in the city table:

SELECT'Found Montreal'
WHERE'Montreal' IN (SELECT city from CITIES)

Montreal is found so everything is all right. Now let’s try the following query to find out if Sidney appears in the table:

SELECT 'Found Sidney'
WHERE 'Sidney' IN (SELECT city from CITIES)

We still get the right result: Sidney is not in the list so no row is returned. Now to find out if Sidney is missing in the table, we would write something like that:

SELECT 'Sidney Not Found'
WHERE 'Sidney' NOT IN (SELECT city from CITIES)

However here something is definitevely wrong. Sidney is not in the list and still no rows is returned. Let’s try a different approach:

SELECT'Sidney Not Found'
WHERE 'Sidney' NOT IN ('Paris','Montreal','New York')

That one works. The null value affects the outcome of the NOT IN operator. This is because the operator compares each city in the list; the previous query is logically equivalent to the following query:

SELECT 'Sidney Not Found'
WHERE 'Sidney'<>'Paris'
AND 'Sidney'<>'Montreal'

We therefore get this logically equivalent query if we add a null value:

SELECT 'Sidney Not Found'
WHERE 'Sidney'<>'Paris'
AND 'Sidney'<>'Montreal'
AND 'Sidney'<>null

… and since, by default, “Sidney <>null” is UNKNOWN (neither true or false), no row is returned because every condition must be true in order for the AND operator to return a TRUE result. The same counter-intuitive result happens with the IN operator, like in this example:

SELECT city from CITIES
WHERE city in (select city from CITIES)

Here null is in the list disappeared, because NULL<>NULL.

When checking for existence, you should use the EXISTS operator if the columns involved are nullables. Using IN operator might produce an inferior plan and can lead to misleading results if a null value is inserted in the table. In our example, we can rewrite our query as:

SELECT 'Sidney Not Found'
WHERE NOT EXISTS
(SELECT 1/0 FROM CITIES WHERE CITY = 'Sidney')

The EXISTS operator returns TRUE if the subquery returns at least a row and FALSE otherwise. Also note that the columns returned by the subquery are never evaluated because there is no need to. That is why the previous query didn’t throw a “Divide by zero error”.

Share on TwitterSave on DeliciousDigg ThisShare via email

Comments

One response to “Using NOT IN operator with null values”

Leave your response
  1. scott strommen says:

    The following also works.

    SELECT’not Found SIDNEY’
    WHERE’Sidney’ NOT IN (SELECT city from CITIES WHERE CITY IS NOT NULL)