SQL Bad Practices

Speeding up aggregates with indexed views

March 25, 2012 by Francois · 3 Comments
Filed under: Performance, TSQL
Social Tagging: Index • indexed view • join • query plan • TSQL • view

Ever so often, queries include some aggregates data. In a datawarehouse environment, they are frequently used for metric and Key Performance Indicator (KPI) calculations. Aggregates are also used a lot for reporting purposes and for statistical computation.

When a query perform poorly, our first instinct is to add an index to speed it up. Creating a view and indexing that view is often an overlooked solution. However, an indexed view might perform much better with less performance hit (on INSERT, UPDATE & DELETE) compared with an index. Enterprise edition of SQL Server is able to automatically use indexed views even when they are not referenced, just like indexes.

Let’s create a typical web log table with two million entries. This table contains the cookie_id of the visitor, the visit date and the transaction value if the user made a purchase during the visit.

-- Create table
IF OBJECT_ID('WEBLOG','U') IS NULL
BEGIN
CREATE TABLE [WEBLOG](
[COOKIE_ID] [int] NOT NULL,
[VISIT_DATE] [date] NOT NULL,
[TRANSACTION_VALUE] [money] NOT NULL
) ON [PRIMARY]

CREATE CLUSTERED INDEX [IX_VISIT_DATE] ON [WEBLOG]
( [VISIT_DATE] ASC ) ON [PRIMARY]
END

-- Empty the table
DELETE FROM WEBLOG

-- Insert 2 million rows
DECLARE @I INT = 0
DECLARE @HITS INT = 2000000
BEGIN TRANSACTION

WHILE @I<@HITS
BEGIN

INSERT INTO WEBLOG(COOKIE_ID,VISIT_DATE,TRANSACTION_VALUE)
SELECT RAND()*@HITS/100 -- COOKIE_ID
,DATEADD(day,CONVERT(INT,(RAND()*365)),'20110101') -- VISIT DATE
,CASE WHEN RAND()<0.01
THEN RAND()*10.0 ELSE 0.0 END -- TRANSACTION_VALUE

SET @I=@I+1
END

COMMIT TRANSACTION

We wish to report the number of visits, the lifetime value (total purchases) and revenue per visit of each cookie:

SELECT COOKIE_ID
, COUNT(*) AS FREQUENCY
, SUM(TRANSACTION_VALUE) AS LIFETIME_VALUE
, AVG(TRANSACTION_VALUE) AS REVENUE_PER_VISIT
FROM [DBO].WEBLOG
GROUP BY COOKIE_ID

On average, this query takes 600.4 milliseconds on my system. We can speed it up using a nonclustered index:

CREATE NONCLUSTERED INDEX [IX_RFM] ON [WEBLOG] ([COOKIE_ID])
INCLUDE ( [TRANSACTION_VALUE]) ON [PRIMARY]

It now takes 336 milliseconds to run the query, a performance improvement of 44%. We can increase this performance gain by using an indexed view instead of creating the previous index:

IF OBJECT_ID('RFM','V') IS NOT NULL
DROP VIEW [RFM]
GO

-- Create view
CREATE VIEW [RFM] WITH SCHEMABINDING AS
SELECT COOKIE_ID, SUM(TRANSACTION_VALUE) AS MONETARY
,COUNT_BIG(*) AS FREQUENCY
FROM [DBO].WEBLOG
GROUP BY COOKIE_ID
GO

-- Create clustered index on view, making it a indexed view
CREATE UNIQUE CLUSTERED INDEX IDX_RFM_V ON [RFM] (COOKIE_ID);

The query now runs in 53.8 ms on average: a 91% performance gain. The thing is, fetching the data is almost instantaneous because the result of the view is materialized: the majority of the elapsed time is spent displaying the query results. Performance do not depend on the underlying table but rather on what can be fetched from the materialized results of the view (everything, in our case):

Query execution plan

Here, even if the average aggregate (AVG) is not defined in the view, the query optimizer is able to derive the result from the COUNT and the SUM aggregates. If the view gets big, you can also create nonclustered indexes on your view to speed-up access to subsets of your view. You get the best query performance gains if your underlying tables are large and your query results stay small (hence the benefit with aggregations).

The improved query speed comes with additional overhead when modifying table data (just like indexes). The following table displays a summary of my test results:

Indexed view is also a great way to improve INNER JOINS performance. When two or more table are prejoined in an indexed view, the query optimizer can choose to retrieve the materialized view data instead of performing a costly join operation.

For more information on indexed views, see the following Microsoft article: http://technet.microsoft.com/en-us/library/cc917715.asp.

Windows Authentication: giving built-in or service accounts permissions to your database

January 16, 2012 by Francois · 1 Comment
Filed under: Security
Social Tagging: Application Pool • IIS • Local Service • Local System • login • login security • Network Service • permissions • Services • Windows Authentication

Some days ago, I was asked to do a quick security check on a web application. After reviewing authentification mechanisms best practices, we quickly looked at how the application was connecting to its databases. As with a lot of web app, the application was connecting to the database through a unique login. (more…)

Should you shrink your database in your maintenance plan?

November 17, 2011 by Francois · Leave a Comment
Filed under: Administration
Social Tagging: DBCC • fragmentation • Maintenance Plan • NOTRUNCATE • SHRINKDATABASE • SHRINKFILE • TRUNCATEONLY

Management Studio provides a neat GUI interface to define maintenance plans for your SQL databases. This tool is available under Management -> Maintenance Plans and provides many great Maintenance Plan Tasks that you can configure easily. There are many useful tasks such as “Check Database Integrity Task”, “Back Up Database Task”, “Rebuild Index Task”, etc. (more…)

Linked servers and distributed queries

October 23, 2011 by Francois · 3 Comments
Filed under: Administration, Performance
Social Tagging: collation • distributed query • linked server • query plan • remote query • sql server • statistics • sub-optimal query plan • TSQL • Use Remote Collation

Linked servers allows to issue SQL commands against OLE DB providers. Because Microsoft also makes available an “OLE DB Provider for ODBC Drivers”, it is also possible to issue queries against a variety of ODBC drivers. With linked servers and distributed queries, you can query all sorts of data sources and merge them on the fly with your SQL Server database. (more…)

Table variable for large tables (vs temporary tables)

September 13, 2011 by Francois · 2 Comments
Filed under: Performance, TSQL
Social Tagging: Index • indexes • OPTION (RECOMPILE) • query plan • set statistics profile • statistics • sub-optimal query plan • table variables • temp tables

The main reason why Microsoft introduced table variable in SQL Server 2000 is to reduce stored procedure recompilations (a recompilation occurs when the stored procedure execution plan is recreated). Table variables also have a well defined scope (the current procedure or function) and they induce less logging and locking (because their transaction last for a single SQL statement). These are great advantages when dealing with short simplier OLTP-style queries and processes. (more…)

Using NOT IN operator with null values

August 18, 2011 by Francois · 7 Comments
Filed under: Performance, TSQL
Social Tagging: IN • NOT IN • NULL • null values • operator

The IN operator compares a value with a list of values. It must however be used with
care when we are dealing with nulls.

Let’s create a table containing three city names and a null value. The goal is check
whether a city is in the list or not. (more…)

Using local variables in T-SQL queries

July 15, 2011 by Francois · 1 Comment
Filed under: Performance
Social Tagging: local variable • OPTION (RECOMPILE) • plan cache • query plan • set statistics profile • statistics • sub-optimal query plan • TSQL • WITH RECOMPILE

A query plan is a set of steps generated by the database engine to retrieve data. Query plans are produced by the query optimizer from SQL statements.

SQL Server automatically caches query plans and try to reuse them whenever possible. For many applications (such as OLTP transactional applications), plan reuse is a very good thing since it avoids unneeded compilations that may take much time to complete each time a query is executed. SQL Server caches query plans (execution plans based on parameter assumptions) but not execution contexts (execution plans based on the actual parameters values). (more…)

Enabling Boost SQL Server priority Option

June 12, 2011 by Francois · 3 Comments
Filed under: Administration
Social Tagging: CPU • server properties

On occasion I have seen Database Administrators enabling the SQL Server “Boost SQL Server priority” option. This option is available on the Server Properties Window under Processors: (more…)

Keeping Maximum Server Memory default value

June 2, 2011 by Francois · Leave a Comment
Filed under: Administration
Social Tagging: DBCC • memory • server properties

The default value for SQL Server 2008 Maximum Server Memory setting is 2,147,483,647 MB (or 2.1 petabytes!). Therefore, by default, SQL Server will use all available memory for its own use: (more…)

SSIS Non cached Lookups without a covering index

May 19, 2011 by Francois · Leave a Comment
Filed under: Performance, SSIS
Social Tagging: covering index • indexes • io statistics • Lookup • Non Cached Lookup • Partial Lookup • SSIS

The vast majority of the time, you will use SSIS Lookup component in Full Cache mode. This mode is the fastest because it queries the database only once (before the data flow starts) and apply hashing in order to do high-performance comparisons.

Sometimes however you will have to use Non Cached lookups. For example if your reference table doesn’t fit in memory or if you wish to lookup rows that you just inserted in your reference table at the beginning of your data flow. You might also run into cases where you need to do inequalities lookups or where you have very few rows at the source and you wish to lookup a table which has several million rows. (more…)

How NOT to retrieve IDENTITY value

May 6, 2011 by Francois · Leave a Comment
Filed under: TSQL
Social Tagging: @@IDENTITY • bug • INSERT • OUTPUT • SCOPE_IDENTITY() • sql • sql server • TSQL

It is a common business case to have to reuse the auto-generated SQL Server’s IDENTITY value. One way to deal with the problem is to use the system function @@IDENTITY. For example: (more…)

Heap Tables

April 29, 2011 by Francois · 2 Comments
Filed under: Performance
Social Tagging: Clustered • Heap • Index • sql • sql server

Heap tables (tables without a clustered index) are generally not part of a good database design.

Unless you never actually query your table, you should always put a clustered index on it. Heap table are generally slower on selects, updates and deletes. They are also generally slower on inserts if you decide to use a nonclustered index instead of a clustered index (which you shouldn’t do). (more…)