Heap Tables

Heap tables (tables without a clustered index) are generally not part of a good database design.

Unless you never actually query your table, you should always put a clustered index on it. Heap table are generally slower on selects, updates and deletes. They are also generally slower on inserts if you decide to use a nonclustered index instead of a clustered index (which you shouldn’t do).

By default, Management Studio creates a clustered primary key so heap are sometimes created because no primary key has been defined on a table. A primary key is NOT the same thing as a clustered index. A primary key cannot have duplicate or null values but a clustered index certainly can.

Moreover, a heap table is way bigger and has generally high fragmentation. Fragmentation occurs on non heap tables but you can at least rebuild your index to get rid of it.

If you have a log table that almost never gets queried and you don’t care about disk space then you could use a heap table. If you need a table to insert large sets of data before creating a clustered index you should create a temporary heap table for performance on inserts. Otherwise, always define a clustered index on all your tables.

Nice article on performance of heap+index versus clustered index: http://msdn.microsoft.com/en-us/library/cc917672.


Comments

2 responses to “Heap Tables”

Leave your response
  1. I know, old post, but “I found something wrong on the internet” so must respond. 🙂
    Not a true statement: “Heap tables (tables without a clustered index) are generally not part of a good database design.”

    DB2, PostreSQL, and MyISAM doesn’t even support non-heap tables. Oracle supports clustered indexes (called IOT in Oracle-speak) but defaults to heap tables.

    Clustered index tables are great when you always want to get a row or a range of rows by the clustered index, but not so great when you have any other indexes. Every row found in non-clustered indexes has an extra index scan against the clustered index because the index cannot reliably store row location (since it changes frequently). As a general rule, a table should be clustered only if it has exactly one index (the cluster) or when the cluster index is the one used for range scans and lookups the majority of the time. Tables with many indexes should almost never be clustered because of the penalty to both reads (on non-clustered index lookups) and writes (to indexes that need to store a copy of the cluster as well as the data for their columns).

    Fortunately, Microsoft SQL is the most flexible database server available when it comes to clustered vs. heap tables. Not only does it support both heap and clustered tables, it allows any column — even nonunique columns — to be the clustered index.

    • Sergey Kuznetsov says:

      Charles,

      Exactly my point to the point!
      People sharing someones (probably flawed) research on old hardware and old version of SQL Server.
      My point in support of Heap Tables is that if someone will have a big compound cluster key it will increase size of non-clustered indexes a lot therefore it will loose a lot of disk space.
      Also to find the data leaf in clustered index you will have to go the path from root page down to a leaf page therefore it will take longer and increase a chance to meet a lock or latch and wait on it for some time.
      In case of heap table every index already have a information where to find a data as it stored in 8 bytes in the format FileID:PageID:SlotID which will bring you to the data way faster than in the clustered table case.
      This is the reason why heap pages behave better under the pressure when lots of the queries goes against a hot data and it was shown in the above mentioned report.
      We have a very heavy load with thousands of parallel requests per second so we see how clustered tables behave versus heap tables.