Using NOT IN operator with null values

The IN operator compares a value with a list of values. It must however be used with
care when we are dealing with nulls.

Let’s create a table containing three city names and a null value. The goal is check
whether a city is in the list or not.

-- By default ANSI_NULLS is off so null comparisons follows the SQL-92 standard.
-- In future version of SQL Server, it won't be possible to modify this setting.
SET ANSI_NULLS OFF

IF EXISTS(select 1 from INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME='CITIES')
DROP TABLE [CITIES]

CREATE TABLE [CITIES] (CITY [varchar](50) NULL)

INSERT INTO CITIES
SELECT 'PARIS' UNION ALL
SELECT 'MONTREAL' UNION ALL
SELECT 'NEW YORK' UNION ALL
SELECT NULL

The table now contains the following city names:

PARIS
MONTREAL
NEW YORK
NULL

Let’s use the IN operator to determine if Montreal is in the city table:

SELECT'Found Montreal'
WHERE'Montreal' IN (SELECT city from CITIES)

Montreal is found so everything is all right. Now let’s try the following query to find out if Sidney appears in the table:

SELECT 'Found Sidney'
WHERE 'Sidney' IN (SELECT city from CITIES)

We still get the right result: Sidney is not in the list so no row is returned. Now to find out if Sidney is missing in the table, we would write something like that:

SELECT 'Sidney Not Found'
WHERE 'Sidney' NOT IN (SELECT city from CITIES)

However here something is definitevely wrong. Sidney is not in the list and still no rows is returned. Let’s try a different approach:

SELECT'Sidney Not Found'
WHERE 'Sidney' NOT IN ('Paris','Montreal','New York')

That one works. The null value affects the outcome of the NOT IN operator. This is because the operator compares each city in the list; the previous query is logically equivalent to the following query:

SELECT 'Sidney Not Found'
WHERE 'Sidney'<>'Paris'
AND 'Sidney'<>'Montreal'

We therefore get this logically equivalent query if we add a null value:

SELECT 'Sidney Not Found'
WHERE 'Sidney'<>'Paris'
AND 'Sidney'<>'Montreal'
AND 'Sidney'<>null

… and since, by default, “Sidney <>null” is UNKNOWN (neither true or false), no row is returned because every condition must be true in order for the AND operator to return a TRUE result. The┬ásame counter-intuitive result happens with the IN operator, like in this example:

SELECT city from CITIES
WHERE city in (select city from CITIES)

Here null is in the list disappeared, because NULL<>NULL.

When checking for existence, you should use the EXISTS operator if the columns involved are nullables. Using IN operator might produce an inferior plan and can lead to misleading results if a null value is inserted in the table. In our example, we can rewrite our query as:

SELECT 'Sidney Not Found'
WHERE NOT EXISTS
(SELECT 1/0 FROM CITIES WHERE CITY = 'Sidney')

The EXISTS operator returns TRUE if the subquery returns at least a row and FALSE otherwise. Also note that the columns returned by the subquery are never evaluated because there is no need to. That is why the previous query didn’t throw a “Divide by zero error”.


Comments

3 responses to “Using NOT IN operator with null values”

Leave your response
  1. scott strommen says:

    The following also works.

    SELECT’not Found SIDNEY’
    WHERE’Sidney’ NOT IN (SELECT city from CITIES WHERE CITY IS NOT NULL)

  2. Nisarg says:

    Really, It was very helpful. Thank you for nice explanations.

  3. JW says:

    I think the discussion of ANSI_NULLS should have more detail. The top 2 commented lines in the first query are not very specific, and could be misinterpreted (cannot change ANSI_NULLS value vs ANSI_NULLS value will be True).

    Per Microsoft TechNet: “In a future version of SQL Server, ANSI_NULLS will always be ON and any applications that explicitly set the option to OFF will generate an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.”

    http://technet.microsoft.com/en-us/library/ms188048(v=sql.105).aspx

    Also, it’s not clear from your test queries below the create table query if you’ve set ANSI_NULLS ON or OFF before executing the queries. Yes, the default setting for existing SQL server versions ANSI_NULLS OFF. But for this issue it’s important to specify whether it’s ON or OFF for the test queries after your create table queries.