Heap Tables

Heap tables (tables without a clustered index) are generally not part of a good database design.

Unless you never actually query your table, you should always put a clustered index on it. Heap table are generally slower on selects, updates and deletes. They are also generally slower on inserts if you decide to use a nonclustered index instead of a clustered index (which you shouldn’t do).

By default, Management Studio creates a clustered primary key so heap are sometimes created because no primary key has been defined on a table. A primary key is NOT the same thing as a clustered index. A primary key cannot have duplicate or null values but a clustered index certainly can.

Moreover, a heap table is way bigger and has generally high fragmentation. Fragmentation occurs on non heap tables but you can at least rebuild your index to get rid of it.

If you have a log table that almost never gets queried and you don’t care about disk space then you could use a heap table. If you need a table to insert large sets of data before creating a clustered index you should create a temporary heap table for performance on inserts. Otherwise, always define a clustered index on all your tables.

Nice article on performance of heap+index versus clustered index: http://msdn.microsoft.com/en-us/library/cc917672.


Comments

One response to “Heap Tables”

Leave your response
  1. I know, old post, but “I found something wrong on the internet” so must respond. 🙂
    Not a true statement: “Heap tables (tables without a clustered index) are generally not part of a good database design.”

    DB2, PostreSQL, and MyISAM doesn’t even support non-heap tables. Oracle supports clustered indexes (called IOT in Oracle-speak) but defaults to heap tables.

    Clustered index tables are great when you always want to get a row or a range of rows by the clustered index, but not so great when you have any other indexes. Every row found in non-clustered indexes has an extra index scan against the clustered index because the index cannot reliably store row location (since it changes frequently). As a general rule, a table should be clustered only if it has exactly one index (the cluster) or when the cluster index is the one used for range scans and lookups the majority of the time. Tables with many indexes should almost never be clustered because of the penalty to both reads (on non-clustered index lookups) and writes (to indexes that need to store a copy of the cluster as well as the data for their columns).

    Fortunately, Microsoft SQL is the most flexible database server available when it comes to clustered vs. heap tables. Not only does it support both heap and clustered tables, it allows any column — even nonunique columns — to be the clustered index.