Speeding up aggregates with indexed views

Ever so often, queries include some aggregates data. In a datawarehouse environment, they are frequently used for metric and Key Performance Indicator (KPI) calculations. Aggregates are also used a lot for reporting purposes and for statistical computation.

When a query perform poorly, our first instinct is to add an index to speed it up. Creating a view and indexing that view is often an overlooked solution. However, an indexed view might perform much better with less performance hit (on INSERT, UPDATE & DELETE) compared with an index. Enterprise edition of SQL Server is able to automatically use indexed views even when they are not referenced, just like indexes.

Let’s create a typical web log table with two million entries. This table contains the cookie_id of the visitor, the visit date and the transaction value if the user made a purchase during the visit.

-- Create table
IF OBJECT_ID('WEBLOG','U') IS NULL
BEGIN
CREATE TABLE [WEBLOG](
[COOKIE_ID] [int] NOT NULL,
[VISIT_DATE] [date] NOT NULL,
[TRANSACTION_VALUE] [money] NOT NULL
) ON [PRIMARY]

CREATE CLUSTERED INDEX [IX_VISIT_DATE] ON [WEBLOG]
( [VISIT_DATE] ASC ) ON [PRIMARY]
END

-- Empty the table
DELETE FROM WEBLOG

-- Insert 2 million rows
DECLARE @I INT = 0
DECLARE @HITS INT = 2000000
BEGIN TRANSACTION

WHILE @I<@HITS
BEGIN

INSERT INTO WEBLOG(COOKIE_ID,VISIT_DATE,TRANSACTION_VALUE)
SELECT RAND()*@HITS/100 -- COOKIE_ID
,DATEADD(day,CONVERT(INT,(RAND()*365)),'20110101') -- VISIT DATE
,CASE WHEN RAND()<0.01
THEN RAND()*10.0 ELSE 0.0 END -- TRANSACTION_VALUE

SET @I=@I+1
END

COMMIT TRANSACTION

We wish to report the number of visits, the lifetime value (total purchases) and revenue per visit of each cookie:

SELECT COOKIE_ID
, COUNT(*) AS FREQUENCY
, SUM(TRANSACTION_VALUE) AS LIFETIME_VALUE
, AVG(TRANSACTION_VALUE) AS REVENUE_PER_VISIT
FROM [DBO].WEBLOG
GROUP BY COOKIE_ID

On average, this query takes 600.4 milliseconds on my system.  We can speed it up using a nonclustered index:

CREATE NONCLUSTERED INDEX [IX_RFM] ON [WEBLOG] ([COOKIE_ID])
INCLUDE ( [TRANSACTION_VALUE]) ON [PRIMARY]

It now takes 336 milliseconds to run the query, a performance improvement of 44%. We can increase this performance gain by using an indexed view instead of creating the previous index:

IF OBJECT_ID('RFM','V') IS NOT NULL
DROP VIEW [RFM]
GO

-- Create view
CREATE VIEW [RFM] WITH SCHEMABINDING AS
SELECT COOKIE_ID, SUM(TRANSACTION_VALUE) AS MONETARY
,COUNT_BIG(*) AS FREQUENCY
FROM [DBO].WEBLOG
GROUP BY COOKIE_ID
GO

-- Create clustered index on view, making it a indexed view
CREATE UNIQUE CLUSTERED INDEX IDX_RFM_V ON [RFM] (COOKIE_ID);

The query now runs in 53.8 ms on average: a 91% performance gain. The thing is, fetching the data is almost instantaneous because the result of the view is materialized: the majority of the elapsed time is spent displaying the query results. Performance do not depend on the underlying table but rather on what can be fetched from the materialized results of the view (everything, in our case):

Indexed view execution plan

Query execution plan

Here, even if the average aggregate (AVG) is not defined in the view, the query optimizer is able to derive the result from the COUNT and the SUM aggregates. If the view gets big, you can also create nonclustered indexes on your view to speed-up access to subsets of your view. You get the best query performance gains if your underlying tables are large and your query results stay small (hence the benefit with aggregations).

The improved query speed comes with additional overhead when modifying table data (just like indexes). The following table displays a summary of my test results:

Indexed views performance

Indexed view is also a great way to improve INNER JOINS performance. When two or more table are prejoined in an indexed view, the query optimizer can choose to retrieve the materialized view data instead of performing a costly join operation.

For more information on indexed views, see the following Microsoft article: http://technet.microsoft.com/en-us/library/cc917715.asp.


Comments

3 responses to “Speeding up aggregates with indexed views”

Leave your response
  1. Peter says:

    This is an outstanding page with excellent and detail rich information – It should be required reading for all aspiring database developers!

  2. Kevin Roberts says:

    Thanks!

    This really helped me out.

  3. John says:

    I disagree with the praising of indexed views. They have many limitations; for example you can’t use MAX, MIN, can’t use OUTER JOINs and can’t reference other views.

    Creating an indexed view on an aggregate query that references multiple table (which is very common in the industry) makes the query very fragile. You may be able to speed the query up today and make the users happy but you’re living on a prayer that your requirements don’t change tomorrow because if they do, you will either be hard pressed to find an acceptable solution or you will experience huge performance degradation.