Today i want to tell you something about statistics in SQL Server 2008. Microsoft SQL Server 2008 collects statistical information about indexes and column data stored in the database. These statistics are used by the SQL Server query optimizer to choose the most efficient plan for retrieving or updating data (BOL). While in the majority of cases SQL Server takes care of statistical housekeeping automatically, it’s useful to have some understanding of what’s going on when you have to intervene.
SQL Server collects the following information:
• The number of rows and pages accupied by a table’s data
• The time that statistics were last updated
• Average length of keys in a column
• Histograms showing the distribution of data
• String summaries thart are used when performaing LIKE queries.
Suppose you have an column with following numbers Anderson, Andersen,Smith,Williams, Zhang. An histogram would look like this :
Andersen 2
Smith 1
Williams 1
Zhang 1
Suppose that you are searching for a value Anderson, Smith and Williams then the estimation, based on the above example would be 4 rows. This about 80% total rows that match the criteria. This would mean that SQL Server is going to scan the table fully instead of using the index (off course in a real world example you need more data).
I used the following code from MSDN :
-- Clean up objects from any previous runs.
IF object_id(N'Person.Contact','U') IS NOT NULL
DROP TABLE Person.Contact
GO
IF EXISTS (SELECT * FROM sys.schemas WHERE name = N'Person')
DROP SCHEMA Person
GO
-- Create a sample schema and table.
CREATE SCHEMA Person
GO
CREATE TABLE Person.Contact(
FirstName nvarchar(60),
LastName nvarchar(60),
Phone nvarchar(15),
Title nvarchar(15)
)
GO
-- Populate the table with a few rows.
INSERT INTO Person.Contact
VALUES(N'John',N'Smith',N'425-555-1234',N'Mr')
INSERT INTO Person.Contact
VALUES(N'Erik',N'Andersen',N'425-555-1111',N'Mr')
INSERT INTO Person.Contact
VALUES(N'Erik',N'Andersen',N'425-555-3333',N'Mr')
INSERT INTO Person.Contact
VALUES(N'Jeff',N'Williams',N'425-555-0000',N'Dr')
INSERT INTO Person.Contact
VALUES(N'Larry',N'Zhang',N'425-555-2222',N'Mr')
GO
-- Show that there are no statistics yet on the Person.Contact table.
sp_helpstats N'Person.Contact', 'ALL'
GO
-- Implicitly create statistics on LastName.
SELECT * FROM Person.Contact WHERE LastName = N'Andersen'
GO
-- Show that statistics were automatically created on LastName.
sp_helpstats N'Person.Contact', 'ALL'
GO
--DROP STATISTICS Person.Contact._WA_Sys_00000002_7D78A4E7
--Same Histograms.
DBCC SHOW_STATISTICS(N'Person.Contact', _WA_Sys_00000002_7D78A4E7)
--DBCC SHOW_STATISTICS(N'Person.Contact', IX_Contact_Lastname)
Below you can see an example of the histogram: