Ever so often, queries include some aggregates data. In a datawarehouse environment, they are frequently used for metric and Key Performance Indicator (KPI) calculations. Aggregates are also used a lot for reporting purposes and for statistical computation.

When a query perform poorly, our first instinct is to add an index to speed it up. Creating a view and indexing that view is often an overlooked solution. However, an indexed view might perform much better with less performance hit (on INSERT, UPDATE & DELETE) compared with an index. Enterprise edition of SQL Server is able to automatically use indexed views even when they are not referenced, just like indexes.

Let’s create a typical web log table with two million entries. This table contains the cookie_id of the visitor, the visit date and the transaction value if the user made a purchase during the visit.

-- Create table
IF OBJECT_ID('WEBLOG','U') IS NULL
BEGIN
CREATE TABLE [WEBLOG](
[COOKIE_ID] [int] NOT NULL,
[VISIT_DATE] [date] NOT NULL,
[TRANSACTION_VALUE] [money] NOT NULL
) ON [PRIMARY]

CREATE CLUSTERED INDEX [IX_VISIT_DATE] ON [WEBLOG]
( [VISIT_DATE] ASC ) ON [PRIMARY]
END

-- Empty the table
DELETE FROM WEBLOG

-- Insert 2 million rows
DECLARE @I INT = 0
DECLARE @HITS INT = 2000000
BEGIN TRANSACTION

WHILE @I<@HITS
 BEGIN
 
 INSERT INTO WEBLOG(COOKIE_ID,VISIT_DATE,TRANSACTION_VALUE)
 SELECT RAND()*@HITS/100 -- COOKIE_ID
 ,DATEADD(day,CONVERT(INT,(RAND()*365)),'20110101') -- VISIT DATE
 ,CASE WHEN RAND()<0.01
 THEN RAND()*10.0 ELSE 0.0 END -- TRANSACTION_VALUE
 
 SET @I=@I+1
 END
 
 COMMIT TRANSACTION
 

We wish to report the number of visits, the lifetime value (total purchases) and revenue per visit of each cookie:

 SELECT COOKIE_ID
 , COUNT(*) AS FREQUENCY
 , SUM(TRANSACTION_VALUE) AS LIFETIME_VALUE
 , AVG(TRANSACTION_VALUE) AS REVENUE_PER_VISIT
 FROM [DBO].WEBLOG
 GROUP BY COOKIE_ID
 

On average, this query takes 600.4 milliseconds on my system.  We can speed it up using a nonclustered index:

 CREATE NONCLUSTERED INDEX [IX_RFM] ON [WEBLOG] ([COOKIE_ID])
 INCLUDE ( [TRANSACTION_VALUE]) ON [PRIMARY]
 

It now takes 336 milliseconds to run the query, a performance improvement of 44%. We can increase this performance gain by using an indexed view instead of creating the previous index:

 IF OBJECT_ID('RFM','V') IS NOT NULL
 DROP VIEW [RFM]
 GO
 
 -- Create view
 CREATE VIEW [RFM] WITH SCHEMABINDING AS
 SELECT COOKIE_ID, SUM(TRANSACTION_VALUE) AS MONETARY
 ,COUNT_BIG(*) AS FREQUENCY
 FROM [DBO].WEBLOG
 GROUP BY COOKIE_ID
 GO
 
 -- Create clustered index on view, making it a indexed view
 CREATE UNIQUE CLUSTERED INDEX IDX_RFM_V ON [RFM] (COOKIE_ID);
 

The query now runs in 53.8 ms on average: a 91% performance gain. The thing is, fetching the data is almost instantaneous because the result of the view is materialized: the majority of the elapsed time is spent displaying the query results. Performance do not depend on the underlying table but rather on what can be fetched from the materialized results of the view (everything, in our case):

Indexed view execution plan

Query execution plan

Here, even if the average aggregate (AVG) is not defined in the view, the query optimizer is able to derive the result from the COUNT and the SUM aggregates. If the view gets big, you can also create nonclustered indexes on your view to speed-up access to subsets of your view. You get the best query performance gains if your underlying tables are large and your query results stay small (hence the benefit with aggregations).

The improved query speed comes with additional overhead when modifying table data (just like indexes). The following table displays a summary of my test results:

Indexed views performance

Indexed view is also a great way to improve INNER JOINS performance. When two or more table are prejoined in an indexed view, the query optimizer can choose to retrieve the materialized view data instead of performing a costly join operation.

For more information on indexed views, see the following Microsoft article: http://technet.microsoft.com/en-us/library/cc917715.asp.

Management Studio provides a neat GUI interface to define maintenance plans for your SQL databases. This tool is available under Management -> Maintenance Plans and provides many great Maintenance Plan Tasks that you can configure easily. There are many useful tasks such as “Check Database Integrity Task”, “Back Up Database Task”, “Rebuild Index Task”, etc. (more…)

Linked servers allows to issue SQL commands against OLE DB providers. Because Microsoft also makes available an “OLE DB Provider for ODBC Drivers”, it is also possible to issue queries against a variety of ODBC drivers. With linked servers and distributed queries, you can query all sorts of data sources and merge them on the fly with your SQL Server database. (more…)

The main reason why Microsoft introduced table variable in SQL Server 2000 is to reduce stored procedure recompilations (a recompilation occurs when the stored procedure execution plan is recreated). Table variables also have a well defined scope (the current procedure or function) and they induce less logging and locking (because their transaction last for a single SQL statement). These are great advantages when dealing with short simplier OLTP-style queries and processes. (more…)

The IN operator compares a value with a list of values. It must however be used with
care when we are dealing with nulls.

Let’s create a table containing three city names and a null value. The goal is check
whether a city is in the list or not. (more…)

A query plan is a set of steps generated by the database engine to retrieve data. Query plans are produced by the query optimizer from SQL statements.

SQL Server automatically caches query plans and try to reuse them whenever possible. For many applications (such as OLTP transactional applications), plan reuse is a very good thing since it avoids unneeded compilations that may take much time to complete each time a query is executed. SQL Server caches query plans (execution plans based on parameter assumptions) but not execution contexts (execution plans based on the actual parameters values). (more…)

On occasion I have seen Database Administrators enabling the SQL Server “Boost SQL Server priority” option. This option is available on the Server Properties Window under Processors: (more…)

The default value for SQL Server 2008 Maximum Server Memory setting is 2,147,483,647 MB (or 2.1 petabytes!). Therefore, by default, SQL Server will use all available memory for its own use: (more…)

The vast majority of the time, you will use SSIS Lookup component in Full Cache mode. This mode is the fastest because it queries the database only once (before the data flow starts) and apply hashing in order to do high-performance comparisons.

Sometimes however you will have to use Non Cached lookups. For example if your reference table doesn’t fit in memory or if you wish to lookup rows that you just inserted in your reference table at the beginning of your data flow. You might also run into cases where you need to do inequalities lookups or where you have very few rows at the source and you wish to lookup a table which has several million rows. (more…)

It is a common business case to have to reuse the auto-generated SQL Server’s IDENTITY value. One way to deal with the problem is to use the system function @@IDENTITY. For example: (more…)

Heap tables (tables without a clustered index) are generally not part of a good database design.

Unless you never actually query your table, you should always put a clustered index on it. Heap table are generally slower on selects, updates and deletes. They are also generally slower on inserts if you decide to use a nonclustered index instead of a clustered index (which you shouldn’t do). (more…)