SSIS Non cached Lookups without a covering index

The vast majority of the time, you will use SSIS Lookup component in Full Cache mode. This mode is the fastest because it queries the database only once (before the data flow starts) and apply hashing in order to do high-performance comparisons.

Sometimes however you will have to use Non Cached lookups. For example if your reference table doesn’t fit in memory or if you wish to lookup rows that you just inserted in your reference table at the beginning of your data flow. You might also run into cases where you need to do inequalities lookups or where you have very few rows at the source and you wish to lookup a table which has several million rows.

Because the Non Cached mode will query the database for each row, it has to be fast.

If your reference table is large and you omit to create a covering index, you will get very poor performance. A covering index, by definition, is an index that contains each column that is used by your query. Note that only the columns in your WHERE statement (the Lookup join columns) must be in the index, the other columns (Lookup reference columns) need only to be in the “Included Columns” of the index. Including your reference columns in the index makes it easy for SQL Server to get your reference values without going back to the table pages.

Let’s create a lookup table:

IF NOT EXISTS(select 1 from INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME='LOOKUP_TABLE')
CREATE TABLE LOOKUP_TABLE(
[LOOKUP] [varchar](36) NULL,
[REFERENCE] [int] NOT NULL PRIMARY KEY CLUSTERED)

And add 1 million rows:

SET NOCOUNT ON

TRUNCATE TABLE LOOKUP_TABLE
DECLARE @I INT = 0
BEGIN TRAN

WHILE @I < 1000000
BEGIN
INSERT INTO LOOKUP_TABLE(LOOKUP,REFERENCE) SELECT NEWID(), @I
SET @I=@I+1
END

COMMIT TRAN

Select No Cache or Partial Cache in the SSIS Lookup Component dialog box:

SSIS Lookup No Cache

The SSIS Lookup query should be defined like this:

SSIS Lookup Query

And the SSIS Lookup Columns should be defined like this:

SSIS Lookup Columns

The covering index is then created this way:

CREATE NONCLUSTERED INDEX
[IX_NON_CLUSTERED_LOOKUP_INCLUDE_LOOKUP]
ON [LOOKUP_TABLE] ([LOOKUP] ASC)
INCLUDE ([REFERENCE])

We can appreciate the performance gain for 1000 rows:

Index Elapsed Time
Clustered Index Scan 37.3 sec
Covering Index 0.109 sec

Another way of judging the performance of the lookup is to display SQL Server query statistics for a single query:

Force SQL Server to use clustered index instead of covering index:

-- Clear memory cache and show some stats
DBCC DROPCLEANBUFFERS
set statistics io on
set statistics time on

-- Use default SSIS Lookup syntax
select * from (
SELECT [LOOKUP],[REFERENCE] FROM [LOOKUP_TABLE]
WITH (INDEX (0)) -- ...but tell SQL to scan the table instead of using the covering index
) [refTable]
where [refTable].[LOOKUP] = '77EE7751-6BC1-4088-874E-D8F4440BB01A'

Results:
Table ‘LOOKUP_TABLE’. Scan count 9, logical reads 6699, physical reads 152
CPU time = 296 ms, elapsed time = 624 ms.

Here SQL Server will use the covering index:

-- Clear memory cache and show some stats
DBCC DROPCLEANBUFFERS
set statistics io on
set statistics time on

-- Use default SSIS Lookup syntax
select * from (
SELECT [LOOKUP],[REFERENCE] FROM [LOOKUP_TABLE]
) [refTable]
where [refTable].[LOOKUP] = '77EE7751-6BC1-4088-874E-D8F4440BB01A'

Results:
Table ‘LOOKUP_TABLE’. Scan count 1, logical reads 3, physical reads 2
CPU time = 0 ms, elapsed time = 31 ms.