Indexing and Performance Enhancements - SQL Server 2008

SQL Server contains a variety of performance features that DBAs can exploit. These include table and index partitioning, filtered indexes, persisted computed columns, and many more features.

The types of indexes available in the SQL Server 2008 relational engine are essentially the same as those available in previous versions, with the addition of a specialized XML index and spatial index types. The basic index types are clustered and nonclustered. Both types of indexes are implemented internally using a variant of a B-tree data structure. A clustered index reorganizes the base data pages of the indexed table, whereas a nonclustered index is created in separate data pages. A table in SQL Server can have a single clustered index and up to 249 nonclustered indexes.

Clustered indexes are generally used to support primary keys and should be used to index “narrow” columns or groups of columns—many sources recommend that clustered index key columns should not exceed a total of 16 bytes per row. This is due to the fact that the key column data will be repeated in the leaf nodes of every nonclustered index.

Nonclustered indexes are, by default, used to support unique keys. They are also used for other types of indexes added for query performance. It’s important for DBAs to remember not to go overboard when creating nonclustered indexes. Each data update of a column that participates in a nonclustered index will need to be written once to the base table and once to the index pages. Creating too many nonclustered indexes can, therefore, have a negative impact on data modification performance.

Online Indexing

A common problem in high-availability scenarios is how and when to perform operations such as index creation, which might decrease response times or totally block other transactions. As it’s often impossible to predict all indexes that a system might require once it goes live, it’s important to be able to apply these changes to the production system. SQL Server provides this capability using its online indexing feature.

Creating, altering, and dropping clustered indexes produces schema modification locks that block other processes from reading and writing to the table. And creating and altering nonclustered indexes produces shared locks that block other processes from writing. Both of these can be avoided using the online indexing feature. Using this feature will allow other processes to continue normal operations, but performing the indexing operation may be quite a bit slower than in offline mode. If it’s important that other processes should be able to continue normal operations during indexing— for instance, when indexing a table in an active OLTP database—this feature should be used. If concurrency is not important, the default offline indexing mode can be used to more quickly complete indexing operations.

To use the online indexing option, use the WITH clause for CREATE INDEX, ALTER INDEX, or DROP INDEX:

CREATE INDEX ix_Table
ON Table (Column)
WITH (ONLINE = ON)

The default value for the ONLINE option is OFF.

Note that this option is not available for indexing operations on tables containing LOB datatypes (TEXT, NTEXT, and IMAGE) or when creating XML indexes.

Controlling Locking During Index Creation

To further control the effects of indexing on other processes that might be attempting to access the data simultaneously, SQL Server allows the DBA to specify whether the indexing process can use row- or page-level locks. Tweaking these options can improve concurrency when creating indexes in live production systems, but beware: overriding what might be the query optimizer’s best option can mean that index creation will take a much longer time. These options should be used only in specific situations in which problems are occurring due to lock contention during index creation. In most cases, they should be left set to their default values.

The DBA can turn off row locking using the ALLOW_ROW_LOCKS option:

CREATE INDEX ix_Table
ON Table (Column)
WITH (ALLOW_ROW_LOCKS = OFF)

Page locking can be turned off using the ALLOW_PAGE_LOCKS option:

CREATE INDEX ix_Table
ON Table (Column)
WITH (ALLOW_PAGE_LOCKS = OFF)

The default value for both of these options is ON, meaning that both row- and page-level locks are allowed. You can combine the options with each other or the ONLINE option by separating the options with a comma:

CREATE INDEX ix_Table
ON Table (Column)
WITH (ONLINE = ON, ALLOW_ROW_LOCKS = OFF, ALLOW_PAGE_LOCKS = OFF)

Creating Indexes with Additional Columns Included

Before SQL Server 2005, DBAs could add columns to nonclustered indexes to “cover” affected queries. For instance, consider the following table and index:

CREATE TABLE DatabaseSystems
(
DatabaseSystemId INT,
Name VARCHAR(35),
IsRelational CHAR(1),
IsObjectOriented CHAR(1),
SupportsXML CHAR(1),
FullSpecifications VARCHAR(MAX)
)
CREATE NONCLUSTERED INDEX IX_Name
ON DatabaseSystems (Name)

A DBA might want to query this table to find out which databases with names starting with S also happened to support XML:

SELECT Name, SupportsXML
FROM DatabaseSystems
WHERE Name LIKE 'S%'
AND SupportsXML = 'Y'

While the LIKE predicate is satisfied by the index, the database engine still has to do a lookup on the base table to get the SupportsXML column. To eliminate the additional lookup and “cover” the query (that is, support all of the columns from the table used in the query), the index can be dropped and a new one created to include the SupportsXML column:

DROP INDEX IX_Name
CREATE NONCLUSTERED INDEX IX_Name_SupportsXML
ON DatabaseSystems(Name, SupportsXML)

The query engine can now get all of the data to satisfy the query from the nonclustered index, without ever looking up data in the table itself.

But what if IX_Name had been a unique index? Or what if the DBA wanted to cover queries that included the FullSpecifications column? Solving the first problem would require creating a new index and leaving the previous one, which would end up wasting space. And indexing the FullSpecifications column was not possible at all. Indexes in SQL Server 2000 could contain only up to 900 bytes per row. Indexing a large varchar was simply not an option.

SQL Server now includes an indexing option designed to solve these problems. DBAs can specify additional columns to be included in a nonclustered index, using the INCLUDE keyword. Included columns are nonindexed but are included in the data pages along with the indexed data, such that they can be used to cover queries. There are no restrictions on width beyond those already enforced at the table level, and uniqueness can be specified for the indexed columns.

To create a unique index that covers the query, use the following:

CREATE UNIQUE NONCLUSTERED INDEX IX_Name
ON DatabaseSystems(Name)
INCLUDE (SupportsXML)

An index could also be created that would cover queries for either SupportsXML or FullSpecifications, or both:

CREATE UNIQUE NONCLUSTERED INDEX IX_Name
ON DatabaseSystems(Name)
INCLUDE (SupportsXML, FullSpecifications)

Keep in mind that creating large indexes that include many large columns can both use a lot of disk space and require massive amounts of I/O when updating or inserting new rows. This is due to the fact that any columns included in a nonclustered index will have their data written to disk twice: once in the base table and once in the index. When using this option to eliminate clustered index lookups, test to ensure that the additional disk strain will not be a problem when writing data.

Altering Indexes

SQL Server 2000 introduced a method of altering an existing index by creating a new one in its place, using the WITH DROP_EXISTING option. This option is especially useful for altering existing clustered indexes as it incurs less overhead than dropping and re-creating the index, by allowing the index to be modified without rebuilding existing nonclustered indexes.

SQL Server makes index alteration a first-class T-SQL operation. The ALTER INDEX syntax, while similar to that of CREATE INDEX, does not support altering an index’s column list; the WITH DROP_EXISTING
option of CREATE INDEX still must be used for that. However, ALTER INDEX offers much additional functionality.

In the following sections, you’ll see how a DBA can use ALTER INDEX to defragment an index (replacing DBCC INDEXDEFRAG), rebuild an index (replacing DBCC DBREINDEX), and disable an index.

Defragmenting an Index

As indexes age, insertion and deletion of noncontiguous data can take its toll and cause fragmentation to occur. Although minor amounts of fragmentation won’t generally hurt performance, it’s a good idea to occasionally defragment indexes to keep databases running as smoothly as possible. Defragmentation, also known as index reorganization, defragments data within data pages, but does not move data pages between extents. Since only data within pages is moved, very little blocking will occur during the defragmentation process, and data can remain available to other processes. However, because extents are not fragmented, this may not be an effective method for defragmenting larger, heavily fragmented indexes. For those situations, index rebuilding is necessary (see the next section, “Rebuilding an Index,” for more information).

To determine the level of fragmentation of an index, the dynamic management function sys.dm_db_index_physical_stats can be used. The column avg_fragmentation_in_percent returns the percentage of fragmented data. Unlike DBCC SHOWCONTIG, extent and logical scan fragmentation are not displayed separately. Instead, the avg_fragmentation_in_percent column shows extent fragmentation for heap tables and logical scan fragmentation for tables with clustered indexes or when displaying information about nonclustered indexes.

Although there is no hard and fast rule, a common recommendation is to keep index fragmentation below 10 percent whenever possible. Microsoft recommends defragmenting indexes that are 30 percent or less fragmented and rebuilding indexes that are more than 30 percent fragmented. To identify indexes in the current database that have more than 10 percent fragmentation, the following query can be used:

SELECT
OBJECT_NAME(i.object_id) AS TableName,
i.name AS IndexName,
ips.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED') ips
JOIN sys.indexes i ON
i.object_id = ips.object_id
AND i.index_id = ips.index_id
WHERE ips.avg_fragmentation_in_percent > 10

The arguments to the sys.dm_db_index_physical_stats function are database ID, table ID, index ID, partition ID, and scan mode. In this example, DB_ID() is passed for the database ID, which tells the function to scan tables in the current database. NULL is passed for table ID, index ID, and partition ID, so that the function does not filter on any of those criteria. Finally, a detailed scan is used. Possible scan modes are LIMITED (the default), SAMPLED, and DETAILED. LIMITED scans only parent-level nodes and is therefore the fastest scan mode. SAMPLED scans parent-level nodes and a percentage of leaf nodes based on the number of rows in the table. DETAILED samples all nodes and is therefore the slowest scan method.

Once a fragmented index is identified, it can be defragmented using ALTER INDEX with the REORGANIZE option. The following query will defragment the index IX_CustomerName on the table

Customers:

ALTER INDEX IX_CustomerName
ON Customers
REORGANIZE

Rebuilding an Index

Index defragmentation reorganizes only the leaf-level nodes of an index. Unfortunately, there are times when that isn’t enough to eliminate index fragmentation, and the entire index needs to be rebuilt. The REBUILD option of ALTER INDEX can be used to facilitate this process. This is equivalent to the functionality of the deprecated DBCC DBREINDEX function.

Rebuilding an index is, by default, an offline operation, because pages and extents are being shuffled. When rebuilding a clustered index, the base table will be locked for the duration of the rebuild, and when rebuilding a nonclustered index, the index will be unavailable during the rebuild. However, ALTER INDEX includes an online indexing option to get around this problem. To rebuild the index IX_CustomerName on table Customers using the online indexing option, use the following query:

ALTER INDEX IX_CustomerName
ON Customers
REBUILD
WITH (ONLINE=ON)

The ONLINE option works by indexing the data outside the data pages in which the data resides, applying deltas for any data modifications, and then updating pointers from the old index to the new index. Because this operation occurs in a separate area, online reindexing will use approximately twice as much disk space as offline reindexing. The process can also be slower, in the case of databases that are very update intensive, due to the additional overhead associated with tracking data changes.

This option is therefore best used for databases that require very high availability; if downtime is acceptable, the ONLINE option will provide no benefit.

Disabling an Index

SQL Server offers an intriguing feature for indexing, namely the ability to disable indexes. This feature generated plenty of speculation as to when and where it should be used. The reality is that this feature was not created for DBAs. Rather, Microsoft included it to make updates and service packs easier to apply. There are no performance benefits or any other “hot topic” uses for disabling an index. Nonetheless, this feature can be handy in some situations.

Disabling a nonclustered index deletes the index’s data rows but keeps its metadata—the index’s definition—intact. Disabling a clustered index, on the other hand, keeps the data but renders it inaccessible until the index is reenabled. And disabling a nonclustered index that is being used to enforce a primary key or unique constraint will disable the constraint.

To disable an index, use ALTER INDEX with the DISABLE option:

ALTER INDEX IX_CustomerName
ON Customers
DISABLE

The index can be reenabled using the REBUILD option:

ALTER INDEX IX_CustomerName
ON Customers
REBUILD

Note that rebuilding a disabled index will require only as much disk space as the index requires, whereas rebuilding a nondisabled index requires twice the disk space: disk space for the existing index and disk space for the new, rebuilt index.

So when should index disabling be used? There are a few circumstances in which it will prove useful. A common task during Extract, Transform, and Load (ETL) processes is dropping indexes before doing bulk data loads, and then re-creating the indexes at the end of the process. Index disabling will lead to fewer code changes; there will be no need to update the ETL code when index definitions change.

Another scenario is systems with low disk space that need indexes rebuilt. Since rebuilding a disabled index takes up half the space compared to rebuilding a nondisabled index, this could prove useful in tight situations. However, note that unlike rebuilding a nondisabled index using the ONLINE option, a disabled index will not be available for online operations during the rebuild process.

A final possible use of index disabling is for testing various index configurations in situations in which the query optimizer isn’t necessarily making the correct choice. Disabling and reenabling indexes should make this process a bit less painful for DBAs, by providing an automatic “backup” of the indexes being worked with.

Using Filtered Indexes

A traditional index on one or more columns covers all the rows within those columns. For the majority of cases, this may be what you need. But consider the case where a table contains a lot of columns with only a few non-null values. Creating an index on such a column would result in a larger than needed index and would take up precious CPU cycles and DBA time when index maintenance is needed. Instead, create a filtered index, which indexes only those rows meeting criteria that you specify. Thus, you can choose to index only the rows having non-null values in the columns being indexed. Filtered indexes can also be used in situations where you have heterogenous columns that contain categories of data or when you query columns that have ranges of values such as dollar amounts, times, and dates.

To help illustrate the value of filtered indexes, consider the scenario of a hardware store that wishes to use a database to store inventory data. The table schemas designed for this business include an Inventory and a ProdCat table. The Inventory table contains all the products that the store sells. These products are grouped into categories like Garden, Electrical, and Hardware. The tables are defined as follows:

CREATE TABLE ProdCat
(product_category_id INT PRIMARY KEY,
product_category_name VARCHAR(30) NOT NULL)
CREATE TABLE Inventory
(product_name VARCHAR(30) NOT NULL,
quantity INT NOT NULL,
product_category INT REFERENCES ProdCat(product_category_id))
INSERT INTO ProdCat VALUES (1,'Garden')
INSERT INTO ProdCat VALUES (2,'Electrical')
INSERT INTO ProdCat VALUES (3,'Hardware')

The hardware store’s database application queries the Inventory table quite frequently for the current inventory of garden accessories. To help the performance of the queries, you would want to create an index on the product_name and quantity columns. If you created a regular nonclustered index, the index would contain not only the Garden category information, but also Electrical and Hardware, since it’s an index for all the rows. To get an even bigger performance gain, you can create a filtered index.

To create the filtered index, use the CREATE INDEX statement followed by the filter predicate. Here is the filtered index definition for the hardware store example:

CREATE NONCLUSTERED INDEX NC_Categories
ON Inventory(product_name,quantity)
WHERE product_category=1

Creating this index and running this simple statement

SELECT product_name,quantity FROM Inventory WHERE product_category=1

yields a query that runs about six times faster than if no index were defined. Filtered indexes do have a few restrictions:

  • They can be created only as nonclustered indexes
  • They are not allowed on views unless they are persisted views.
  • They cannot be created on XML indexes or full-text indexes.
  • They cannot reference a computed column or a user-defined type column (this implies filtered indexes don’t work with the new hierarchyid and spatial datatypes).

Even with these restrictions, filtered indexes lower index maintenance costs and storage requirements by targeting specific sets of data.

Filtered indexes are supported by SQL Server Management Studio as well as the Database Tuning Advisor (DTA) tool. DTA will analyze a workload against SQL Server and give you recommendationsto improve performance. Filtered indexes are included as part of the suggestions in DTA.

Using Filtered Statistics

Filtered statistics are created automatically with the same filter predicate whenever users create a filtered index, as described in the previous section. However, you can manually create a filtered statistic using the CREATE STATISTIC statement. Queries on subsets of data that contain nonindexed columns benefit from creating a filtered statistic. Filtered statistics on a subset of rows can improve the accuracy of the statistics that the query optimizer uses for the query plan. Filtered statistics can also be effective with queries that select from correlated columns.

The syntax for creating filtered statistics is similar to creating a nonfiltered statistic. Here is an example:

CREATE STATISTICS Quantity
ON Inventory(Quantity)
WHERE product_category=2;

Using Statistics for Correlated Datetime Columns

SQL Server includes an optimization to assist with queries on tables that share similar datetime data. When turned on, extra statistics will be generated for datetime columns. Joining two tables, each with datetime columns that share a foreign key reference, may allow the query optimizer to be able to determine a better plan using the additional statistics.

For instance, the AdventureWorks sample database contains a table of orders called Sales.SalesOrderHeader and a table of corresponding order detail (line items) called
Sales.SalesOrderDetail. Each table contains a datetime column, ModifiedDate.

Assume that for auditing purposes, it’s a business requirement of Adventure Works Cycles that any modification to an order detail row happens within 24 hours of a modification to the corresponding order header row. This would mean that all ModifiedDate values in the Sales.SalesOrderDetail table would fall into a range between the order header’s modified date and 24 hours later. The query optimizer could potentially use this fact to improve performance of certain queries.

A requirement for the optimizer being able to use correlated datetime statistics is that at least one of the tables’ datetime columns must be the key column for a clustered index. Since neither table includes the ModifiedDate column in its clustered index, one of the clustered indexes would need to be altered in order to use this optimization.

Once statistics are turned on and the correct indexes are in place, the query optimizer will be able to use the statistics to help drive better query plans. For instance, given the Sales.SalesOrderHeader and Sales.SalesOrderDetail tables, a user might want to see all orders modified in the last 24 hours and their corresponding line items, using the following query:

SELECT *
FROM Sales.SalesOrderHeader SOH
JOIN Sales.SalesOrderDetail SOD ON SOH.SalesOrderId = SOD.SalesOrderId
WHERE SOH.ModifiedDate >= DATEADD(hh, -24, GETDATE())

If date correlation is enabled and the correct indexes are in place, the query optimizer can analyze the datetime statistics for these two tables and determine that data for the ModifiedDate column of the SalesOrderHeader table is always 24 hours or less before the ModifiedDate column of the corresponding rows in the SalesOrderDetail table. This can allow the query to be internally rewritten into the following, possibly more efficient, format:

SELECT *
FROM Sales.SalesOrderHeader SOH
JOIN Sales.SalesOrderDetail SOD ON SOH.SalesOrderId = SOD.SalesOrderId
WHERE SOH.ModifiedDate >= DATEADD(hh, -24, GETDATE())
AND SOD.ModifiedDate >= DATEADD(hh, -24, GETDATE())
AND SOD.ModifiedDate <= GETDATE()

This form of the query can take advantage of a clustered index that involves the SalesOrderDetail’s
ModifiedDate column, thereby possibly avoiding an expensive clustered index lookup operation. This will be especially advantageous as the dataset in each table grows larger and the date columns become more highly selective.

To turn on date correlation statistics for the AdventureWorks database, the following T-SQL would be used:

ALTER DATABASE AdventureWorks
SET DATE_CORRELATION_OPTIMIZATION ON

Note that when performing this action, the database must not have any users connected or the only connection should be the one running the ALTER DATABASE statement.

Once the optimization is enabled, it will be automatically maintained by the query engine. Due to the extra work involved, there is a performance penalty for inserts or updates, so make sure to test carefully before rolling this into production environments. To find out if a database has date correlation turned on, query the is_date_correlation_on column of the sys.databases catalog view:

SELECT
Name,
is_date_correlation_on
FROM sys.databases

The column is_date_correlation_on will have a value of 1 if date correlation is turned on for a database; otherwise, it will have a value of 0.

Improving Performance of Ordering for Tertiary Collations

For situations in which string case sensitivity is unimportant from a uniqueness perspective but necessary for sorting purposes, SQL Server supports so-called tertiary collations. String data defined with these collations will be ordered based on case sensitivity (uppercase letters will sort before lowercase letters). However, grouping by or using the distinct operator on such a column will result in uppercase and lowercase letters being treated identically.

For example, take the following table, which includes an indexed, tertiary-collated column:

CREATE TABLE Characters
(
CharacterString CHAR(3)
COLLATE SQL_Latin1_General_CP437_CI_AI
)
CREATE CLUSTERED INDEX IX_Characters
ON Characters (Characterstring)
INSERT Characters VALUES ('aaa')
INSERT Characters VALUES ('Aaa')

Selecting the data from this table using an ORDER BY clause on the CharacterString column will result in two rows being returned. The row with the value 'Aaa' will sort first, followed by the row with the value 'aaa'. However, selecting the data from this table using the DISTINCT option returns only a single row. Only sorting is case sensitive. Grouping and uniqueness operations are not case sensitive.

Ordering a tertiary-collated column requires an intermediate step during which weights for each character are determined. This step is expensive, so users have the ability to precalculate the weights using the TERTIARY_WEIGHTS function.

Selecting data from the table ordered by CharacterString requires an intermediate computation and sort, even though the data in the index is already sorted, as indicated by the execution plan for an ordered SELECT statement on this table, shown in Figure.

Sorting on tertiary-collated columns requires an intermediate step.

The solution to this problem is to create a computed column using the TERTIARY_WEIGHTS function and add it to the index to be used for sorting. The table and index should have been created this way:

CREATE TABLE Characters
(
CharacterString CHAR(3)
COLLATE SQL_Latin1_General_CP437_CI_AI,
CharacterWeights AS (TERTIARY_WEIGHTS(CharacterString))
)
CREATE CLUSTERED INDEX IX_Characters
ON Characters
(
CharacterString,
CharacterWeights
)

As the intermediate sort is no longer required.

When a computed column using the TERTIARY_WEIGHTS function is used in the index,the intermediate step is no longer required.

When a computed column using the TERTIARY_WEIGHTS function is used in the index, the intermediate step is no longer required.

Table and Index Partitioning

A common requirement in dealing with larger datasets is the ability to split the data into smaller chunks to help improve performance. Performance degradation becomes apparent once tables reach larger sizes, and splitting data across files and disks is one way to help databases scale. Although previous versions of SQL Server supported various means of partitioning data—either manually or via features like partitioned views—doing so has always been somewhat of a headache-inducing experience. Partitioning in versions prior to SQL Server 2000 meant splitting data across multiple tables, and then writing application code that could properly navigate the partitions. Things got better with SQL Server 2000’s partitioned views feature, but it was difficult to swap data in or out of the partitions without affecting data availability.

Today, SQL Server makes data partitioning much easier, thanks to the inclusion of a partitioning strategy that allows the server to automatically handle partitioning of tables and indexes based on range data. Partition ranges are defined using functions called partition functions, and ranges are assigned to one or more filegroups using partition schemes. After a function and scheme are created, tables and indexes can use them for partitioning data. In this section, we’ll examine how to use these features to build better-performing databases.

Partition Functions

Partition functions are the means by which the DBA can control which ranges of data will be used to enforce partition boundary values. These functions map partitions based on a datatype and ranges of values for that datatype, but they do not actually partition anything. Due to the fact that they only define partitions, partition functions are reusable; a single function can be used to partition many tables or indexes using the same ranges. The basic syntax for creating a partition function is as follows:

CREATE PARTITION FUNCTION partition_ function_name(input_parameter_type)
AS RANGE [ LEFT | RIGHT ]
FOR VALUES ( [ boundary_value [ ,...n ] ] )
[ ; ]

Partition functions must take a single input parameter (a column) of a specific datatype—multicolumn partition functions are not supported. The function is defined in terms of ranges, and the LEFT or RIGHT designator controls the placement of the actual boundary value. For a LEFT function, each partition will be defined as containing all values less than or equal to its upper bound. For a RIGHT function, each partition will be defined as containing all values less than its upper bound; the boundary value itself will go into the next partition.

Partition ranges cannot be designed to constrain input values to a given range. Values that fall below the lowest bound will be placed into the lowest partition. Values that fall above the highest bound will be placed into an automatically generated partition for values above that bound. For example, to create a partition function based on fiscal quarters of 2008, the following T-SQL could be used:

CREATE PARTITION FUNCTION pf_FiscalQuarter2008 (DATETIME)
AS RANGE RIGHT FOR VALUES
('20080401', '20080701', '20081001', '20090101')

This function actually creates five partitions. The first partition contains every value less than April 1, 2008 (remember, RANGE RIGHT defines less-than values; if you wanted to include midnight for April 1, 2008, you could use a RANGE LEFT partition). The second, third, and fourth partitions contain all values less than July 1, 2008, October 1, 2008, and January 1, 2009, respectively. The final, implicit partition contains all values greater than or equal to January 1, 2009.

Partition Schemes

Partition schemes are the means by which the boundary values defined in partition functions can be mapped to physical filegroups. The DBA has the option of either mapping all of the partitions from a function into the same filegroup (using the ALL option) or specifying a filegroup for each partition individually. The same filegroup can be used for multiple partitions.

The basic syntax for creating a partition scheme is as follows:

CREATE PARTITION SCHEME partition_scheme_name
AS PARTITION partition_function_name
[ ALL ] TO ( { filegroup_name | [PRIMARY] } [ ,...n] )
[ ; ]

To specify that all partitions from the partition function pf_FiscalQuarter2008 (defined in the preceding section) should be mapped to the primary filegroup, the following T-SQL would be used:

CREATE PARTITION SCHEME ps_FiscalQuarter2008_PRIMARY
AS PARTITION pf_FiscalQuarter2008
ALL TO ([PRIMARY])

This example uses the ALL option to map all of the partitions to the same filegroup. It should also be noted that the primary filegroup is always specified using square brackets when defining partition schemes.

If the DBA wanted to map the first two partitions to the filegroup Q1Q2_2008 and the other three partitions to the filegroup Q3Q4_2008, the following T-SQL would be used:

CREATE PARTITION SCHEME ps_FiscalQuarter2008_Split
AS PARTITION pf_FiscalQuarter2008
TO (Q1Q2_2008, Q1Q2_2008, Q3Q4_2008, Q3Q4_2008, Q3Q4_2008)

Note that this example assumes that the filegroups have already been created in the database using ALTER DATABASE ADD FILEGROUP. Also be aware that multiple schemes can be created for a single
function, so if there are several objects in a database that should be partitioned using the same data ranges but that should not share the same filegroups, multiple functions do not need to be created.

Creating Partitioned Tables and Indexes

Once partition functions and schemes have been defined, the DBA can begin using them to partition tables and indexes, which is, of course, the point to this whole exercise. CREATE TABLE and CREATE INDEX both have an ON clause that has been used in previous editions of SQL Server to specify a specific filegroup in which the table or index should be created. That clause still functions as before, but it has now been enhanced to accept a partition scheme.

Given the partition function and schemes created in the previous sections for fiscal quarters in 2008, the following T-SQL could be used to create a partitioned table to record sales amounts, partitioned by the time of the sale:

CREATE TABLE SalesAmounts
(
SalesAmountId INT NOT NULL PRIMARY KEY NONCLUSTERED,
SalesAmount NUMERIC(9,2) NOT NULL,
SalesDate DATETIME NOT NULL
)
GO
CREATE CLUSTERED INDEX IX_SalesAmounts_SalesDate
ON SalesAmounts (SalesDate)
ON ps_FiscalQuarter2008_Split (SalesDate)

The table is created using a nonclustered primary key, leaving the table itself available for indexing using a clustered index. Since a table’s clustered index organizes the data in the entire table, creating the cluster on the partition range partitions the entire table.

Data from this table will now be partitioned based on the ps_FiscalQuarter2008_Split range function, using SalesDate as the partitioning column. Data for any date less than July 1, 2008, will be put into the Q1Q2_2008 partition; data for any date greater than or equal to July 1, 2008, will be put into the Q3Q4_2008 partition. Likewise, when selecting data from this table using the SalesDate column as a predicate in the WHERE clause, the query engine will need to seek only the necessary partitions for the requested data.

Creating a partitioned index is very similar to creating a partitioned table; the ON clause is used to specify a partition scheme. For instance, to create a nonclustered index on the SalesAmounts table for seeking SalesAmount values, the following T-SQL syntax could be used:

CREATE INDEX IX_Amount
ON SalesAmounts
(
SalesAmount
)
ON ps_FiscalQuarter2008_PRIMARY (SalesDate)

This index will be partitioned on the SalesDate column, and because the partition scheme ps_FiscalQuarter2008_PRIMARY was specified, all five partitions will be maintained in the primary filegroup. Note that the partitioning column, SalesDate, need not be included in the index.

Adding and Removing Partitions

In addition to creating new partitioned tables and indexes, SQL Server also exposes capabilities for DBAs to partition existing tables, modify range boundaries of existing functions and schemes, and swap data in and out of partitions.

Partitioning an existing table can be done in one of two ways. The easier method is to create a clustered index on the table, partitioned using whatever partition scheme the DBA wishes to employ. The other method requires manipulation of partition functions and will be covered in the next section, “Modifying Partition Functions and Schemes.”

Assume that in the same database that contains the SalesAmounts table and related partition function and schemes there exists the following table, which contains times that customers visited the store:

CREATE TABLE Visitors
(
VisitorId INT NOT NULL,
VisitDate DATETIME NOT NULL,
CONSTRAINT PK_Visitors
PRIMARY KEY (VisitorId, VisitDate)
)

The DBA might wish to partition this table using the same scheme as the sales data, such that data in similar date ranges will share the same filegroups. This table already has a clustered index, implicitly created by the PK_Visitors primary key constraint. To partition the table, the constraint must be dropped. The constraint then must be re-created using a partition scheme. The following T-SQL code accomplishes that:

SET XACT_ABORT ON

BEGIN TRANSACTION
ALTER TABLE Visitors
DROP CONSTRAINT PK_Visitors
ALTER TABLE Visitors
ADD CONSTRAINT PK_Visitors
PRIMARY KEY (VisitorId, VisitDate)
ON ps_FiscalQuarter2005_Split (VisitDate)
COMMIT

To avoid inconsistent data, the entire operation should be carried out in a single transaction. SET XACT_ABORT is used to guarantee that runtime errors in the transaction will force a rollback.

Converting this table back to a nonpartitioned table can be done using either the reverse operation (dropping the partitioned clustered index and re-creating the index nonpartitioned) or by modifying the partition function to have only a single partition.

Modifying Partition Functions and Schemes

Partition functions can be altered in two primary ways. Ranges can be “merged” into other ranges (dropped) or new ranges can be “split” off of existing ranges (added). Removing ranges using the MERGE keyword is quite straightforward; splitting new ranges using the SPLIT keyword can be a bit more involved, as the partition scheme must also be altered in order to handle the new partition.

For example, if the DBA wished to eliminate the range from pf_FiscalQuarter2008 ending on September 30, thereby creating a larger range that ends on December 31, the following T-SQL would be used:

ALTER PARTITION FUNCTION
pf_FiscalQuarter2008()
MERGE RANGE ('20081001')

The range specified in a merge must be exactly convertible to a range boundary that exists in the partition function. As a result of amerge operation, the data will be merged into the next partition, and the partition scheme(s) associated with the function will be updated appropriately. As mentioned earlier, this can be one way of departitioning a table: alter the associated partition function, merging the ranges until only a single partition remains.

Splitting a partition function to create new range boundaries requires first altering the associated scheme(s) to provide a “next-used” filegroup, which will receive the additional partition range data. Remember that a partition scheme must have exactly the same number of filegroups as its underlying function has ranges.

To add a next-used filegroup to a partition scheme—in this case, specifying that additional partitions can be placed in the primary filegroup—the following T-SQL could be used:

ALTER PARTITION SCHEME ps_FiscalQuarter2008_Split
NEXT USED [PRIMARY]

Once the scheme has been appropriately altered, the partition function itself can have an additional range added. To add a range to pf_FiscalQuarter2008 for all of 2009 (not minding that the function is now misnamed), the following T-SQL could be used:

ALTER PARTITION FUNCTION
pf_FiscalQuarter2008()
SPLIT RANGE ('20100101')

Remember that because the function is a RANGE RIGHT function, this new range boundary ends on December 31, 2009. Data from this range will be placed into the primary filegroup, as that was the next-used partition defined before it was created.

Switching Tables Into and Out of Partitions

DBAs can move data into partitions from unpartitioned tables and out of partitions back into unpartitioned tables. The former can be useful for data-loading processes, as data can be bulkloaded into an unindexed table and then switched into a partitioned structure. The latter can be useful for data archival or other purposes.

Assume that the following staging table has been created for 2009 visitor data:

CREATE TABLE VisitorStaging_2009
(
VisitorId INT NOT NULL,
VisitDate DATETIME NOT NULL
)

This table has the same exact schema as the Visitors table partitioned using the ps_FiscalQuarter2008_split function. It should also have been created on whatever filegroup the DBA wishes it to eventually end up on as part of the partition scheme. For the sake of this example, that will be assumed to be the primary filegroup.

Once data for the 2009 time period has been bulk-loaded into the table, the same indexes and constraints must be created on the staging table as exist on the Visitors table. In this case, that’s only the PRIMARY KEY constraint:

ALTER TABLE VisitorStaging_2009
ADD CONSTRAINT PK_Visitors_2009
PRIMARY KEY (VisitorId, VisitDate)

A CHECK constraint must also be created on the table to guarantee that the data falls within the same range as the partition into which the table will be switched. This can be done with the following T-SQL for this example:

ALTER TABLE VisitorStaging_2009
ADD CONSTRAINT CK_Visitors_06012009_12012010
CHECK (VisitDate >= '20090101' AND VisitDate < '20100101')

Once the CHECK constraint is in place, the table is ready to be switched into the new partition. First, the partition boundary number for the new partition should be queried from the sys.partition_functions and sys.partition_range_values catalog views:

SELECT rv.boundary_id
FROM sys.partition_functions f
JOIN sys.partition_range_values rv ON f.function_id = rv.function_id
WHERE rv.value = CONVERT(datetime, '20100101')
AND f.name = 'pf_FiscalQuarter2008'

This value can then be plugged into the SWITCH TO option of ALTER TABLE. In this case, the boundary ID is 4, so the following T-SQL switches the VisitorStaging_2009 table into that partition:

ALTER TABLE VisitorStaging_2009
SWITCH TO Visitors PARTITION 4

The data from the staging table can now be logically queried from the Visitors partitioned table. The staging table can be deleted.

Switching tables out of partitions is much easier. Assuming that the DBA wanted to switch the data back out of the partition just switched into from the staging table, the DBA could re-create the staging table—again, on the same partition and with the same clustered index, but this time without using a CHECK constraint. Once the empty table is in place, the data can be switched out of the partition using the following T-SQL:

ALTER TABLE Visitors
SWITCH PARTITION 4 TO VisitorStaging_2009

Managing Table and Index Partitions

Management of table and index partitions is similar to management of tables and indexes, with one major difference: it’s possible to reindex a specific partition in a table, should the DBA not wish to reindex the entire table at once. In addition, SQL Server includes a series of catalog views to assist with enumerating and viewing data related to partitions.

Rebuilding an index for a specific partition number is very similar to rebuilding an entire index, with the addition of a new clause to the ALTER INDEX syntax: the PARTITION clause. This clause takes a partition number as input. For instance, to rebuild partition 4 of the PK_Visitors index on the Visitors table—assuming that the index is partitioned—the following T-SQL would be used:

ALTER INDEX PK_Visitors
ON Visitors
REBUILD
PARTITION = 4

The ONLINE option and other indexing options are also available. This functionality can help DBAs to more accurately pinpoint and eliminate performance bottlenecks in large partitioned tables. Three catalog views are provided to assist with viewing partition function data:

  • The sys.partition_functions view contains data about which partition functions have been created.
  • The sys.partition_range_values view, used with the sys.partition_functions view in an example in a previous section, contains the actual ranges specified for a function.
  • The sys.partition_parameters function contains information about the parameter datatype used for a function.

The sys.partition_schemes view contains information about schemes. The sys.partitions and sys.partition_counts views contain data about the actual mapping between tables and their partitions, including row counts, used data pages, reserved data pages, and various other statistics.

Using Indexed Views

Although indexed views are still fairly restrictive (DBAs cannot use subqueries, derived tables, and many other constructs), they have been made slightly more flexible with each release of SQL Server. The query optimizer can match more query types to indexed views. These include scalar expressions, such as (ColA + ColB) * ColC, and scalar aggregate functions, such as COUNT_BIG(*). For instance, the following indexed view could be created in the AdventureWorks database, indexed on the OrderTotal column:

CREATE VIEW Sales.OrderTotals
WITH SCHEMABINDING
AS
SELECT
SalesOrderId,
SubTotal + TaxAmt + Freight AS OrderTotal
FROM Sales.SalesOrderHeader
GO
CREATE UNIQUE CLUSTERED INDEX IX_OrderTotals
ON Sales.OrderTotals
(OrderTotal, SalesOrderId)

The query optimizer will now be able to consider queries such as the following for optimization by using the indexed view:

SELECT SalesOrderId
FROM Sales.SalesOrderHeader
WHERE SubTotal + TaxAmt + Freight > 300

This optimization also includes better matching for queries against indexes that use userdefined functions.

Using Partition-Aligned Indexed Views

Indexed views are powerful because query result sets are materialized immediately and persisted in physical storage in the database, which saves the overhead of performing costly operations like joins or aggregations at execution time. Up until the release of SQL Server 2008, indexed views were of little use in the table-partitioning scenario, because it was very difficult to switch in or out a partition to an underlying table that had an indexed view defined on it. To use indexed views with partitions, users needed to drop the indexed view, switch partitions, and then re-create the view. This solution could take quite some time to complete. SQL Server 2008 solves this problem with the introduction of partition-aligned indexed views.

Partitions are switched in and out of tables via the ALTER TABLE statement with the SWITCH option. This DDL transfers subsets of data between source and target tables quickly and efficiently. There are a number of requirements when using this statement. For example, the source and target tables (and views) must share the same filegroup. To avoid frustration when trying to switch partitions, see SQL Server Books Online’s enumeration of the requirements, under the “Transferring Data Efficiently by Using Partition Switching” topic.

There are no special DDL statements to support partition-aligned index views; rather, the indexed view simply works when you SWITCH the partitions. If you use partitions in your database and need indexed views, this feature will definitely be a welcome addition in your environment.

Persisting Computed Columns

In certain situations, it can be useful to create columns whose values are computed dynamically by the SQL Server engine when referenced in a query, rather than inserted with an explicit value. Prior versions of SQL Server included the computed column feature for this purpose. Computed columns were able to be indexed, and the data existed within the index to be used for seeking or by queries covered by the index. However, a noncovered query that needed the same data would not be able to use the value persisted within the index, and it would have to be rebuilt dynamically at runtime. For complex computed columns, this can become a serious performance drain.

To eliminate this problem, the PERSIST keyword is used when creating a computed column in SQL Server. Its behavior is simple enough. Instead of the column’s value being calculated at runtime, it is calculated only once, at insert or update time, and stored on disk with the rest of the column data.

To add a new persisted computed column to a table, the following T-SQL could be used, assuming that dbo.VeryComplexFunction() is a very complex function that is slowing down SELECT statements:

ALTER TABLE SalesData
ADD ComplexOutput AS
(dbo.VeryComplexFunction(CustomerId, OrderId))
PERSISTED

Note that existing computed columns cannot be made persisted; they must be dropped and recreated as persisted computed columns. Likewise, persisted computed columns cannot be altered back into regular computed columns. They also will need to be dropped and re-created.

To determine which computed columns are persisted, query the is_persisted column of the sys.computed_columns catalog view. is_persisted will have a value of 1 if the column is persisted and 0 otherwise. For instance, the following query shows which columns of which tables in the current database are persisted computed columns:

SELECT OBJECT_NAME(object_id), name
FROM sys.computed_columns
WHERE is_persisted = 1

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SQL Server 2008 Topics