Saturday, October 27, 2007

Page it

We are most familiar with the data row. The row size is set only by the definition of the table that holds it (e.g. A table of addresses require more data per row then a table of class names). In SQL Server, a table may define a row as storing as little as 4 bytes to as much as 8060.This limit is set by the size of the data page, which stores up to 8,192 bytes (8 KB). The remaining 132 bytes are used by SQL Server to track other information under the covers. Although SQL Server is designed around 8 KB pages, the smallest unit of data that SQL Server can allocate is 64 KB. This is called an extent.

To store the data in a sorted order, as in a phone book, SQL Server uses something called a clustered index. When a typical database is created, clustered indexes exist on nearly all tables. However, just because the data exists in sorted order within the page does not mean that it exists as such within an extent. The reason for this derives from situations in which there is no more room on a given page in which it can insert a row. SQL Server then removes approximately half the page and moves it to another page, which is called a Page Split (Page Splits will not occur with clustered indexes on IDENTITY based columns, but hotspotting may). In some cases, it may move that data to another extent altogether, possibly even allocating a new extent to do so. So, while we start off with names beginning with A and ending with H on one page, and names beginning with I and ending with Z on the next page, through usage, we may see that names A through C are now located on one page in one extent, D through E on another extent and S through Z back on the fifth page of the first extent, etc. It is because of the page split that there are times in which we may prefer to use tables with no clustered indexes at all. However, these tables are usually scratch tables which are highly volatile. In those situations, we desire the quicker write times at the cost of slower reads.