Tuesday, March 30, 2010

Notes on Index Rebuild/Reorganize

you should rebuild when an index is over 30% fragmented and reorganize when an index is between 10% and 30% fragmented. If fragmentation is below 10%, you can probably just leave that index alone. Rebuilding takes more server resources (and uses locks unless you use the ONLINE option available in 2005 Enterprise and Development editions), so reorg when possible. Syntax below to rebuild or reorg all indexes on a table named tTable follows…

ALTER INDEX ALL ON dbo.tTable REBUILD
GO

ALTER INDEX ALL ON dbo.tTable REORGANIZE
GO



The differences between the two:

1) Index rebuild re-creates the index internally again and when that has been achieved, it drops the existing index.(Therefore the rebuild needs space for storing the new index.) Index reorganize is the process of physically re-organizing the leaf nodes of the index.

2) During index rebuild, the statistics are also re-computed – same as when a new index gets created. Reorganize on the other hand does not update the statistics. Reorganize essentially just swaps one page with another and thus does not require free space for this operation like rebuild does. Infact, reorganize can free up some pages as it does the reorg in two phases – compaction and defrag. A reorganize can remove almost all of the logical fragmentation but it cannot necessarily fix extent fragmentation in which the previous and the next extents are physically contiguous.

3) Another point to note is that an index (clustered or non-clustered) cannot be built online if it contains LOB data (text, ntext, varchar(max), nvarchar(max), varbinary(max), image and xml data type columns). The ALTER INDEX…REORGANIZE command shown above is the same as DBCC INDEXDEFRAG but there is one difference. ALTER INDEX…REORGANIZE has some additional features like large objects compaction (LOB_COMPACTION). And this is an online operation.

4) Regarding partitions of an index, if an index has multiple partitions, then you cannot rebuild a single partition online. You can reorganize a single index partition online. If you want to rebuild an index with multiple partitions in an online environment, you need to rebuild the entire index which means rebuilding all the partitions.


Reorganizing an index defragments the leaf level of clustered and nonclustered indexes on tables and views by physically reordering the leaf-level pages to match the logical order (left to right) of the leaf nodes. Having the pages in order improves index-scanning performance. The index is reorganized within the existing pages allocated to it; no new pages are allocated. If an index spans more than one file, the files are reorganized one at a time. Pages do not migrate between files.

Reorganizing also compacts the index pages. Any empty pages created by this compaction are removed providing additional available disk space. Compaction is based on the fill factor value in the sys.indexes catalog view.

The reorganize process uses minimal system resources. Also, reorganizing is automatically performed online. The process does not hold long-term blocking locks; therefore, it will not block running queries or updates.

Reorganize an index when the index is not heavily fragmented. See the previous table for fragmentation guidelines. However, if the index is heavily fragmented, you will achieve better results by rebuilding the index.

http://technet.microsoft.com/en-us/library/ms189858.aspx





----------------
further notes - to rework


Types of searches

* Full scans
* Binary searches
* Linear interpolation - fixed record size, monotonely increasing key

Singleton Lookup

* Index Seek / Clustered Index Seek
* Find record using index pages

Allocation Order Scan

* Table Scan or unordered Clustered/Index Scan
* Pages splits during allocation order scans
* Table lock helps (TABLOCK, TABLOCKX) but not required

Advanced scanning

* Also known as "Merry-go-round" scan
* Multiple scans on the same table
* First scan starts, second scan joins in the middle, later scans the rest
* Enterprise edition only
* allocation order scans only
* See http://www.sqlmag.com/Article/ArticleID/49285/sql_server_49285.html
* See http://msdn2.microsoft.com/en-us/library/ms191475.aspx

Range Scan

* Index Scan / Clustered Index Scan
* Find first, then follw next page pointer

Readahead

* Pre-reading of pages in the leaf level during Range Scans
* 1, 8 or 32 pages (8KB, 64KB or 256KB) - only if pages are contiguous
* 128 pages (1024KB) only in Enterprise edition
* Logical fragmentation prevents optimal readahead
* See http://msdn.microsoft.com/en-us/library/ms191475.aspx

Logical Fragmentation

* Next page pointer does not point to next physical page
* DBCC SHOWCONTIG is deprecated now
* Look at avg_fragmentation_in_percent in sys.dm_db_index_physical_stats
* Not relevant if you have a small number of pages (less than 100?)
* Not relevant if all pages are already in the buffer pool (in cache)
* Logical IOs are not necessarily physical IOs

When to do something about it

* When to rebuild/reorganize? It always depends...
* Different for data warehouse or OLTP, for instance
* If you must have a number: 5-30% - ALTER INDEX REORGANIZE
* If you must have a number: >30% - ALTER INDEX REBUILD WITH (ONLINE = ON)*
* See http://msdn.microsoft.com/en-us/library/ms189858.aspx
* Many just rebuild everything every week (could be totally unnecessary)

Extent Fragmentation

* Extents are not contiguous
* Affects readahead, but not much (can read 8 but not 32 pages)
* In a heap, logical fragmentation is actually extent fragmentation
* With multiple files, round robin does not necessarily mean fragmentation
* Interesting KB: http://support.microsoft.com/kb/329526

Page Density

* How much space is actually in use in the page
* Affected by row size (5K fixed-length row leaves 3K unused in every page)
* Affected by page splits
* Affected by FILLFACTOR
* In the end, low page density leads to more IOs
* This is actually a kind of fragmentation

Page Splits

* Page is full and you need to add something (INSERT, UPDATE with larger data)
* Take half the rows to a new page, fix up all the pointers in both
* Could lead to page splits to non-leaf levels, all the way up to the root
* All fully logged, not matter what log level you are in
* Typically creates two low-density pages and are not contiguous
* Does not happen with a heap, only with a clustered index
* Does not happen with a key that is ever increasing (like identity)

Things that cause fragmentation

* GUID as high order key (NEWSEQUENTIALID can help)
* Variable lenght colums
* Misconfigured FILLFACTOR
* Wide rows (with 5K rows being the typical example)
* Clustered indexes that cause random insertion order (sales with customer,date)

FILLFACTOR

* Leave space on pages to avoid page splits in leaf level
* Applies only on rebuild, not when doing INSERT/UPDATE
* PAD_INDEX will do the same thing for non-leaf level
* For data warehouse, no need for FILLFACTOR
* For OLTP, it's hard to calculate the ideal factor, no quick answer
* Look at how many splits, how frequently you can rebuild
* Pick a value, watch how it goes. Then change factor or rebuild frequency
* If you must choose a number, try 70%

Symptoms of at Fragmentation

* Longer run times
* More disk activity (SET STATISTICS IO ON)
* More logging activity

Looking at Fragmentation

* Query sys.dm_db_index_physical_stats (replaces DBCC SHOWCONTIG)
* Look at avg_fragmentation_in_percent for fragmentation
* Look at avg_page_space_used_in_percent for page density
* You can summarize the results, join with sys.indexes to show index names
* See sample query at http://www.sqlskills.com/blogs/paul/post/Can-GUID-cluster-keys-cause-non-clustered-index-fragmentation.aspx

Querying sys.dm_db_index_physical_stats

* It could be expensive, since it bring lots of pages into the buffer pool
* DETAILED (reads everything)
* SAMPLED (reads 1% of the pages if less than 10,000 pages)
* LIMITED (parent level of b-tree, same as DBCC SHOWCONTIG WITH FAST)
* Limited can't tell you page density, since it does not read the page level

Fixing fragmentation

* Three main choices
* Rebuild - ALTER INDEX ... REBUILD (replaces DBCC REINDEX)
* Recreate - CREATE INDEX WITH DROP_EXISTING
* Reorganize - ALTER INDEX ... REORGANIZE (replaces DBCC INDEXDEFRAG)
* DROP, CREATE (separately) not good due to possible issue with constraints
* You can also decide not to do anything at all
* You don't necessarily have to rebuild everything every day...

Rebuild

* Can be done online (unless you have LOB columns)
* Can use multiple CPUs (control with MAXDOP)
* Works single partition or all partitions
* Rebuilds index statistics (equivalent of a full scan)
* Does not update column statistics
* Can be minimally logged
* Atomic. If you interrupt, it rolls back.
* Online - short-hold Shared lock, short-hold SCHema_Mod lock
* Offline clustered - eXclusive lock
* Offline non-clustered - Shared lock
* Creates new before dropping old. Reads from the old index.
* Always rebuilds everything. Faster for largely fragmented indexes.

Online Index Rebuild

* 1) Preparation - New index is created and set to write-only
* very short Shared lock
* 2) Build - Data is inserted from source.
* Two copies of the data being updated
* Scans know they shouldn't use the other - "anti-matter" record
* 3) Final - Index metadata is updated.
* short-hold SCHema_Mod lock
* Old data is dropped
* Details at http://msdn.microsoft.com/en-us/library/ms191261.aspx

Recreate

* Basically same as rebuild
* Can move to a new location
* Can change the schema
* Good way to do a shrink without using shrink

Reorganize

* Is always online (even if you have LOB columns)
* Always single threaded
* Works single partition or all partitions
* Table IX lock - locks only TABLOCKs and escalated locks
* Addresses only existing fragmentation
* Faster for lightly fragemented indexes
* Does not use much extra space

Reorganize Phase 1 - Page compaction

* Moves data from neighbor pages to get closer to FILLFACTOR
* Leaf level only. Works on a sliding window (a few pages at a time)
* Tries eXclusive lock on the pages involved. If can't, moves on.
* If it can empty pages, it will deallocated them.
* It will also remove ghost records.

Reorganize Phase 2 - Page defragment

* Make logical order the same as the allocation order
* Leaf level only. Shuffle pages arounds without using extra space.
* Uses one extra free page (page type 19 - unlinked reorg page)
* Locks enough pages to get the operation done without blocking others.
* Reminded me of the "towers of hanoi" problem :-)

Wednesday, March 17, 2010

Statistics basic

To list all of the statistics being maintained on a table-

select * from sys.stats where object_id = object_id('brkprd')


To retrieve detailed information about an index or statistic

dbcc show_statistics ('brkprd','IX_Brkprd_AcctIdIsNonZeroBP')

More on indexes

A clustered index is an index whose leaf nodes, that is the lowest level of the index, contain the actual data pages of the underlying table.

By default, a primary key constraint creates a clustered index.

In a nonclustered index, the lowest level of the index does not contain the data page of the table. Instead, it contains the information that allows SQL Server to navigate to the data pages it needs. For tables that have a clustered index, the leaf node of the nonclustered index contains the clustered index keys. In the previous example, the leaf node of a nonclustered index on the Customers table would contain the Customer_ID key.

If the underlying table does not have a clustered index (this data structure is known as a heap), the leaf node of the nonclustered index contains a row locator to the heap data pages.

Friday, March 12, 2010

Sorting Number Strings

/*

create table t (a varchar(10))
insert t select '1a1'
insert t select '4*0'
insert t select '1aw0'
insert t select 's20'
insert t select 't02'
insert t select '11r1'
insert t select '11v4g7@4/'

1
02
4
10
11
20
111

select * from t order by len(a),a
a98a3455


select * from t order by a
select a,dbo.fn_StringNumSort(a) from t order by dbo.fn_StringNumSort(a)

select dbo.fn_StringNumSort('102a')

*/

Alter Function dbo.fn_StringNumSort
(
@mStringVar Varchar(max)
)
Returns Varchar(max)
As
Begin


--1. Shred chars and numbers
--2. Concat 10000000000 to numbers so that the total length stays 11 - Substring('10000000000', 1, 11-Len(Set of numbers))
--3. Cases - abc13ab, a1bc13a1b1, 1, a, 02, 1, 10

-- declare @mStringVar varchar(max)
-- select @mStringVar = '02'

declare @result varchar(max), @VarLen int, @StartPointer int, @EndPointer int;
select @result = '', @VarLen = Len(@mStringVar), @StartPointer = 1, @EndPointer = 1;

While @EndPointer <= @VarLen
Begin
--48 to 57 is the ASCII for numbers 0-9; find the first position of number
If ASCII(Substring(@mStringVar,@EndPointer,1)) >= 48 And ASCII(Substring(@mStringVar,@EndPointer,1)) <= 57 AND @EndPointer <= @VarLen
Begin
-- select 'num', @StartPointer, @EndPointer, ASCII(Substring(@mStringVar,@EndPointer,1))
--While there are numbers, move the pointer ahead till it reaches the next char
While ASCII(Substring(@mStringVar,@EndPointer,1)) >= 48 AND ASCII(Substring(@mStringVar,@EndPointer,1)) <= 57 AND @EndPointer <= @VarLen
select @EndPointer = @EndPointer + 1

--Handle if there is single number or multiple number - single number will return 0 for (@EndPointer-@StartPointer)
If @StartPointer < @EndPointer
select @result =
CASE LEN(@result)
When 0 Then Substring('10000000000', 1, 11-Len(Substring(@mStringVar,@StartPointer,@EndPointer-@StartPointer))) + Substring(@mStringVar,@StartPointer,@EndPointer-@StartPointer)
-- Else @result + '-' + Substring(@mStringVar,@StartPointer,@EndPointer-@StartPointer)
Else @result + Substring('10000000000', 1, 11-Len(Substring(@mStringVar,@StartPointer,@EndPointer-@StartPointer))) + Substring(@mStringVar,@StartPointer,@EndPointer-@StartPointer)

END
Else If @StartPointer = @EndPointer
select @result =
CASE LEN(@result)
When 0 Then Substring('10000000000', 1, 11-Len(Substring(@mStringVar,@StartPointer,1))) + Substring(@mStringVar,@StartPointer,1)
-- Else @result + '-' + Substring(@mStringVar,@StartPointer,1)
Else @result + Substring('10000000000', 1, 11-Len(Substring(@mStringVar,@StartPointer,1))) + Substring(@mStringVar,@StartPointer,1)

END
--Mark the start of the chars
select @StartPointer = @EndPointer

-- select @StartPointer, @EndPointer
-- select @result

End
Else
Begin
-- select 'char', @StartPointer, @EndPointer, ASCII(Substring(@mStringVar,@EndPointer,1)),Substring(@mStringVar,@EndPointer,1)
--While there are chars, move the pointer ahead till it reaches the next number
While (ASCII(Substring(@mStringVar,@EndPointer,1)) < 48 Or ASCII(Substring(@mStringVar,@EndPointer,1)) > 57) AND @EndPointer <= @VarLen
select @EndPointer = @EndPointer + 1

--Handle if there is single number or multiple chars - single chars will return 0 for (@EndPointer-@StartPointer)
If @StartPointer < @EndPointer
select @result =
CASE LEN(@result)
When 0 Then Substring(@mStringVar,@StartPointer,@EndPointer-@StartPointer)
Else @result + Substring(@mStringVar,@StartPointer,@EndPointer-@StartPointer)
END
Else If @StartPointer = @EndPointer
select @result =
CASE LEN(@result)
When 0 Then Substring(@mStringVar,@StartPointer,1)
Else @result + Substring(@mStringVar,@StartPointer,1)
END
--Mark the start of the numbers
select @StartPointer = @EndPointer

-- select @StartPointer, @EndPointer
-- select @result

End

End
Return (@result)
End