Optimizing Storage - SQL Server 2008

Users and organizations continue to pile gigabytes of information into their databases. From a cost standpoint, adding extra disk storage isn’t usually a roadblock for most people.

The effects of very large amounts of data are evident when performing regular database maintenance or data retrieval. As the database grows larger, and the database file and log files increase in size, so do the database backups. This not only increases the time to back up the database, but may also increase the recovery time in case of a failure. SQL Server 2008 addresses the data explosion by providing out-of-the-box support for database backup compression, as well as the ability to compress the data that is stored within the database itself.

Backup Compression

Backup compression has been a long-awaited feature within the SQL Server database engine. Until now, users who wanted to compress backups would need to purchase third-party backup solutions. In SQL Server 2008, backup compression is supported natively in the database engine for both disk and tape backup devices. This feature is available in Enterprise Edition of SQL Server. (As with all other Enterprise Edition features, if you are using a Developer or Evaluation Edition, you will still be able to use backup compression.)

You can enable backup compression by default on all database backups or for a specific backup. To enable compression for all backups, set the global server-level option 'backup compression default' to a value of 1, as shown in the following code:

USE master
GO
EXEC sp_configure ' backup compression default', '1';
RECONFIGURE WITH OVERRIDE;

To enable compression for a specific backup, specify WITH COMPRESSION as an option to the BACKUP DATABASE statement.

To see backup compression at work, let’s back up the AdventureWorks database. To compare the savings of compression, first back up the database without compression, and then do the backup with compression.

--Without compression

BACKUP DATABASE AdventureWorks TO DISK='C:Backups Advture WorksDB_no compression.bak'

GO
--With compression

BACKUP DATABASE AdventureWorks TO DISK='C: Backups Advture Works DB_ compression.bak'

WITH COMPRESSION

Now we have two backup files on our hard disk. To see the compressed backup size and the compression ratio, issue the following query against the backupset table in msdb:

The result of this query is as follows:

Optimizing Storage
You can see that the database backup size was about 172MB without compression vs. 40MB with compression. The actual compression savings will depend on the type of data being compressed. For example, encrypted data stored as varbinary will not compress as well as varchar data.

Obviously, when a database is compressed, the backup file size will be smaller. Also, compressing a backup significantly reduces the elapsed time of the backup operation since a compressed backup requires less device I/O than an uncompressed backup of the same data. However, during the process of compressing a backup, the CPU usage will increase. If you are concerned how backup compression affects your server, you can watch a few performance counters: the “Device Throughput Bytes/sec” counter of the SQLServer:Backup Device object and the “Backup/Restore Throughput/sec” counter of the SQLServer:Databases object.

Overall, database compression will be a very welcome addition to the DBA’s backup strategies. The end result of backup compression is savings on the physical disk file size. However, this will not do us much good if our live data is so large that performance is suffering due to costly disk I/O. The good news is that SQL Server 2008 supports data compression as well, as discussed in the next section.

Data Compression

Most organizations that utilize databases do not have just one copy of the data. Data is usually backed up and transported to some off-site location. Some data might be transferred over to a staging database and then into a data warehouse for further processing.

As the world around us keeps producing more and more digitally born data, managing that data spins out of control. Suddenly, updating an index or performing a database backup takes longer. Your database maintenance time slot may have fit before, but with more data, you may find that maintenance actions are being performed during regular office hours. Users are complaining, and you think it might be time to get your resume ready. SQL Server 2008 has another tool in the toolbox to help with this data explosion. This tool is native support for data compression within tables and indexes.

Before we jump into the details of how data compression works within SQL Server, let’s consider how we could reduce the size of data in a database. At a high level, we need to reduce the amount of information stored, as the bottle neck we are circumventing is disk I/O contention. One way to achieve this is to compress the datafile on the physical disk. This option would achieve our goal of reducing the data footprint; however, in a relational database environment, this would not be the best solution from a performance perspective. Since data is read in chunks from the file system, we would end up doing more compression and decompression for data that is unneeded. The other option is to store the data more efficiently within the rows and pages of the database itself. This pushes more work on the CPU for an efficient read and write of compressed data to the physical disk. Storing the data more efficiently within the database is the option most database vendors, including Microsoft, use to address the data-compression problem.

Variable-Length Decimal Storage Format

SQL Server 2008 is the first version that contains native data-compression support; however, starting with SQL Server 2005 Service Pack (SP) 2 came an improvement that was designed to reduce the storage requirements for decimal values. Microsoft found that data warehouse fact tables contained a lot of zeros and nulls, and using a normal decimal datatype to store these values, each took a good 5 bytes of storage. Add a few million rows to a fact table that contains a bunch of zeros and nulls, and you can see the advantage of introducing a way to knock down the size of the table.

The solution is to use a vardecimal storage format. This is not a new datatype; rather, it’s a switch that causes the internal storage of the decimal datatype to change. When zero and null values are stored in the vardecimal storage format, they are optimized to use only 2 bytes. Other variable sizes will range anywhere from 5 to 20 bytes per value.

To take advantage of the vardecimal storage format, you need to enable the functionality both at the database level and the table level, as in the following example:

USE master;
GO
EXEC sp_db_vardecimal_storage_format 'AdventureWorks', 'ON';
GO
USE AdventureWorks;
GO
EXEC sp_tableoption 'Sales.SalesOrderDetail', 'vardecimal storage format', 'ON';

A simple call to sp_tableoption would cause existing fixed-length decimal datatype declarations to behave as variable-length decimal datatypes. The behavior is the same as the way varchar datatype variables store char information.

Converting all your decimal tables to vardecimal storage format may not always be the best solution. For example, as with using the varchar datatype, if you define a column to be varchar(10) and are almost always using the full ten characters of space allocated, then you are much better off using char(10) and saving the overhead of the extra variable offset information that is stored in the variable-length datatype.

If you were not already familiar with the vardecimal storage format in SQL Server 2005 SP2, you may be excited to give it a try on your decimal data. However, it is important to note that this feature has been deprecated starting in SQL Server 2008, which now offers row-compression support. Since row compression in SQL Server 2008 performs the same task as enabling vardecimal support, the sp_db_vardecimal_storage_format and sp_tableoption options to enable vardecimal support are deprecated.

Row Compression

The vardecimal storage format introduced in SQL Server 2005 SP2 optimized the decimal datatype by changing the underlying storage of the data while not forcing developers to change their datatype declarations in their applications. In SQL Server 2008, this optimization of the decimal fixed-length datatype has been expanded to include other numeric types and types based on them, such as int, decimal, float, datetime, and money. This new behavior is wrapped up in a feature called row compression. Row compression does a great job at handling zero and null values. In fact, it takes no space to store zero or null values after row compression is enabled.

SQL Server 2008 comes packed with tools that help DBAs configure and utilize the new compression feature. In addition to a Data Compression Wizard, which we will discuss a little later in this chapter, you can use the sp_estimate_data_compression_savings stored procedure to sample the data in your table and get an estimate of the compression savings. You do not need to enable compression on the entire table; compression can be enabled on a specific partition as well. This procedure enumerates all the partitions and the compression savings obtained as they relate to the object being compressed.

To help illustrate the power of sp_estimate_data_compression_savings, let’s take a look at one of the larger tables in the AdventureWorks database. The SalesOrderDetail table contains more than 120,000 rows of data, taking up about 10MB of space. This table contains a mix of datatypes, including int, money, and numeric, which are all good candidates for compression. To see the savings when row compression is applied, issue this statement:

sp_estimate_data_compression_savings 'Sales','SalesOrderDetail',NULL,NULL,'ROW'

The results of this stored procedure contain eight columns. They are as follows:

Row Compression

The results show three entries for SalesOrderDetail but only one partition. This is because compression takes into consideration each of the three indexes defined on the table. An index_id of 1 represents the clustered index defined on the table. The index_id values 2 and 3 (actually, any value greater than 1) represent nonclustered indexes. (An index_id of 0 means no index is defined.) When you enable compression on a table, heap or clustered indexes will be compressed. Nonclustered indexes are not automatically compressed when you issue the compression statement. However, the results from the query show the before and after savings for all indexes.

To enable row compression on a table, you can use Data Definition Language (DDL) or the Data Compression Wizard (discussed shortly). For example, if we decided to go ahead and enable compression on the Sales OrderDetail table, we could issue the following:

ALTER TABLE [Sales].[SalesOrderDetail] REBUILD
WITH (DATA_COMPRESSION = ROW)

From the sp_estimate_data_compression_savings results, we can expect the table size to now be about 7.54MB (7728KB).

As noted, you can also compress just partitions. If we had multiple partitions and wanted to compress just a single partition, our statement would be something like the following:

ALTER TABLE [Sales].[SalesOrderDetail] REBUILD PARTITION=4
WITH (DATA_COMPRESSION = ROW)

Page Compression

Compressing data with row compression is good if you have a lot of unique data. However, if you have a lot of frequently occurring data, you still end up storing the same repeated compressed value. This is where page compression comes in.

Enabling page compression reduces the amount of information on each page. It does this by utilizing column prefixes and dictionary compression.

Using column prefixes, a prefix list is stored in the page. This list contains patterns of common values per column. Consider the sample data for the Sales.

Sample Data for Sales.SalesOrderDetail Table

Sample Data for Sales.SalesOrderDetail Table

You can see that the ProductID column does not have any repeated values, and thus there are no valid column prefixes. The UnitCost column has the value 32 repeated twice, so the database engine stores the value 32 in a column prefix variable CA and replaces the contents of the data in the row to point to CA. The SalesPrice column also has a repeated value of 45. The database engine will create a column prefix value CB for the value 45 and replace the data in the row with pointers to CB.

The Inventory column initially doesn’t appear to have any repeated values. However, the database engine notices that the numbers 167 are the first three digits of two rows in this column, so it will store 167 as a column prefix and label it CC. In the actual rows of data, it will place a pointer to the column prefix and the extra information. In the first row, the extra information is 0 to make the full value of 1670. In the second row, the extra information is 2 to make the full value of 1672. In the third row, the value stored is 16, which is the first two digits of our column prefix 167. The database engine has some additional smarts to recognize this and stores a pointer to CC and an indication that it should use only the first two digits in the prefix.

In the last column, ProductGroupID, the database engine will create a column prefix CD and give it a value of 1. It will then store pointers to CD wherever the value 1 is stored in any of the rows for that given column.

To recap, our column prefix values are as follows:

CA: 32
CB: 45
CC: 167
CD: 1

Our data page now resembles Table.

Data Page After Column Prefix Definitions

Data Page After Column Prefix Definitions

The other optimization that occurs with page compression is dictionary compression. This looks for common values stored across all columns and rows, as opposed to colunm prefixes, which consider just a given column. In our example, we have the value 1 in both the ProductID column and the ProductGroupID column. Thus, the database engine creates a dictionary entry DA with a value of 1 and replaces the value 1 in the ProductID column and the column prefix value CD in the ProductGroupID column. This page-level process continues, and in our example, we have two additional page-level dictionary entries: DB with a value of 32, and DC with a value of 2. After dictionary compression, the page will resemble Table.

Data Page After Column Prefix Definitions and Dictionary Compression

Data Page After Column Prefix Definitions and Dictionary Compression

As for row compression, SQL Server 2008 has a stored procedure that estimates your savings with page-level compression: sp_estimate_data_compression_savings. Issuing the following statement will show the savings estimate for the SalesOrderDetail table in the AdventureWorks database using page compression:

sp_estimate_data_compression_savings 'Sales','SalesOrderDetail',NULL,NULL,'PAGE'

The results of this statement are as follows:

results of this statement are as follows

As some of you may have speculated, our sample table benefitted significantly from page compression. Instead of just 7.54MB using row compression, page compression yields a table size of just 4928KB, or about 4.8MB.

As with row compression, you can enable page compression using the Data Compression Wizard (discussed next) or DDL. For example, the following code will enable page compression on the SalesOrderDetail table:

ALTER TABLE [Sales].[SalesOrderDetail] REBUILD
WITH (DATA_COMPRESSION = PAGE)

Using the Data Compression Wizard

In addition to DDL, you can use the Data Compression Wizard in SQL Server Management Studio to easily configure data compression. Under the covers, the wizard calls the same stored procedures as DDL discussed in the previous sections. If you are not a command-line junkie, you will appreciate this easy point-and-click way of configuring compression. The wizard allows you to play with different configurations in an attempt to determine which might yield better compression results.

To launch the Data Compression Wizard, in Object Explorer, right-click any table or index and select Storage ➤Manage Compression. Figure shows an example of using the wizard with the Sales.SalesOrderDetail table of the Adventure Works database.

Data Compression Wizard showing per partition compression settings

Data Compression Wizard showing per partition compression settings

As the wizard shows, this table has about 121,000 rows and takes about 9.8MB for the data alone. This table is definitely a good candidate for compression. By choosing different options from the Compression Type drop-down list, you can see estimates of the savings for each specific compression type. In Figure, Page is chosen as the compression type, and the wizard shows that the table would end up being about 4.8MB instead of more than 9.8MB. If you selected row compression, you would see that this table would end up being about 7.5MB, which is only a 20 percent reduction, as opposed to the more than 50 percent reduction achieved using page compression.

Monitoring Compression Performance

Two compression-specific performance monitor counters may be useful in monitoring compression performance. Both of these counters are available as part of the SQL Server:Access Methods object.

Assuming table compression is defined, when rows are added to a page, nothing out of the ordinary happens. When the page fills up and is ready to split, the database engine will perform a check to see if the kind of data that is stored will benefit from compression. This action is shown by the “Page compression attempts” counter. If page compression will be of benefit, the database engine will compress the page, and the “Page compressed/sec” counter is incremented.

Since the act of checking the data itself is a tax on the CPU, having a lot of attempts and a relatively smaller amount of actual compressions may cause delays in query execution. This situation can happen when the data stored is more unique than similar. Thus, in situations where the page compression attempts are much greater than the actual page compressions, you may want to forego compressing the tables.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SQL Server 2008 Topics