Saturday, September 1, 2012

Columnstore Index


Columnstore index was introduced in the SQL Server 2012 to significantly speed-up the processing time of common data warehousing queries.

Generally data warehousing workloads involve summarizing large volume of data. As you might know, we use different techniques in data-warehousing (DW) and decision support systems (DSS) to improve performance like pre-computed summary tables, indexed views, OLAP cubes etc. Although, these techniques greatly speed up query the data processing but they have some side effects. These techniques can be inflexible, hard to maintain, and must be designed specifically for specific query problem.

Data warehousing processing overhead can be significantly reduced by using SQL Server Columnstore index. Not only that, columnstore indexes reduces the overhead of the other solutions. Columnstore indexes also enable queries to compute the results quickly so that pre-computation is not required.


Characterisitcs of SQL Server Columnstore Indexes:

·        Columnar data format – unlike the traditional row based organization of data (called rowstore format), columnstore indexes group and store data in one column at a time. SQL Server query processing can take advantage of this new data layout and significantly improve query execution time.

·        Faster query results – Columnstore indexes can produce faster results for the following reasons:

o    Only the columns needed must be read. Therefore, less data is read from disk to memory and then from memory to processor cache.
o    Columns are heavily compressed. This reduces the number of bytes that must be read and moved and hence less I/O operations.
o    Most queries do not touch all columns of the table. Therefore, many columns will never be brought into memory. This, combined with excellent compression, improves buffer pool usage, which reduces total I/O.
o    Advanced query execution technology processes chunks of columns called batches in a streamlined manner, reducing CPU usage.
·        Key columns – There is no concept of key columns in a columnstore index.

·        Clustered index key – all the columns of clustered index must be present in the nonclustered columnstore index. If a column in the clustering key is not listed in the create index statement, it will be added to the columnstore index automatically.

·        Partitioning – Columnstore indexes works with table partitioning. No change to the table partitioning syntax is required. Albeit, nonclustered columnstore index can only be created on a partitioned table if the partitioning column is part of the columnstore index.

·        Record Size – There is no index key record size limitation on columnstore indexes.

·        Query processing – Along with the columnstore index, SQL Server introduces batch processing to take advantage of the columnar orientation of the data. The columnstore structure and batch processing both contribute to performance gains.

·        Table cannot be updated – For SQL Server 2012, a table with a columnstore index cannot be updated.


How to create Columnstore Index:

Columnstore index can be create using SQL Server Management Studio as well as using T-SQL.

Creating Columnstore index by using SQL Server Management Studio (SSMS): 

1.       Open Management Studio and use Object Explorer to connect to the SQL Server Database Engine.
2.       In Object Explorer, expand the instance of SQL Server, expand Databases, expand a database, expand a table, right-click on required table, point to New Index, and then click Non-Clustered Columnstore Index.
3.       In the Index name dialog box, under the General tab, type a name for the new index, and then click Add.
4.       In the Select Columns dialog box, select the columns to participate in the columnstore index, and then click OK two times to create the index.


















Creating Columnstore index by using T-SQL:

Create columnstore index on Employee table, name it CSI_Employee, and include all the columns!


-- Check whether index CSI_Employee exists on Employee
-- and delete if exists.
IF EXISTS(  SELECT  name FROM sys.indexes
            WHERE   object_id = OBJECT_ID('dbo.Employee')
            AND     name = 'CSI_Employee')
DROP INDEX CSI_Employee ON dbo.Employee
GO
 
-- Create nonclustered columnstore index CSI_Employee on Employee.
CREATE NONCLUSTERED COLUMNSTORE INDEX CSI_Employee
ON dbo.Employee
(
     EmployeeID
    ,FirstName
    ,MiddleName
    ,LastName
    ,DOB
    ,DepartmentID
)
GO 

Data Types

Columnstore index support some data-types and doesn't support others. You need to be aware of this while creating columnstore index to avoid any run time errors.

The following data types can be included in a columnstore index:
·        char and varchar
·        nchar and nvarchar (except varchar(max) and nvarchar(max))
·        decimal (and numeric) (Except with precision greater than 18 digits.)
·        int, bigint, smallint, and tinyint
·        float (and real)
·        bit
·        money and smallmoney
·        All date and time data types (except datetimeoffset with scale greater than 2)

The following data types cannot be included in a columnstore index:
·        binary and varbinary
·        ntext, text, and image
·        varchar(max) and nvarchar(max)
·     uniqueidentifier
·         timestamp
·         sql_variant
·         decimal and numeric with precision greater than 18 digits
·         datetimeoffset with scale greater than 2
·         CLR types (hierarchyid and spatial types)
·         xml

 
There are certain restrictions and limitations on columnstore indexes which are listed below:

·         Columnstor index cannot have more than 1024 columns.
·         Columnstor index annot be clustered or unique. Only nonclustered columnstore indexes are available.
·         Columnstor index cannot be created on a view or indexed view.
·         Cannot include columns of restricted data-types as mentioned above.
·         Cannot act as a primary key or a foreign key.
·         Cannot be changed using the ALTER INDEX statement. Drop and re-create the columnstore index instead.
·         Cannot be created with the INCLUDE keyword.
·         Cannot include the ASC or DESC keywords for sorting the index. Columnstore indexes are ordered according to the compression algorithms.
·         Does not use or keep statistics in the manner of a traditional index.

Note: You can compare the performance optimization be executing heavy volume queries before and after creating columnstore index. You might get 50% to 90% performance gain!
 

No comments:

Post a Comment

Here are few FREE resources you may find helpful.