Export (0) Print
Expand All

Reducing Storage Requirements and Improving Performance Using Microsoft SQL Server 2008 R2 Data Compression

Technical Case Study

Published: May 2012

The following content may no longer reflect Microsoft’s current position or infrastructure. This content should be viewed as reference documentation only, to inform IT business decisions within your own company or organization.

OEM IT took advantage of the data compression features available in SQL Server 2008 R2 to create a compression strategy that decreased its storage requirements by 33 percent. By using data compression, OEM IT was able to reduce storage costs, improve performance, improve server health and I/O performance, achieve faster response times for users, and provide ongoing data capacity management.

Download

Download Technical Case Study, 380 KB, Microsoft Word file

Situation

Solution

Benefits

Products & Technologies

OEM IT shared a commitment with Microsoft IT to reduce the size of its database footprint by 24 percent.

OEM IT applied compression to 30 tables and indexes in each of its top six databases and reduced its storage requirements by 33 percent.

  • Reduced storage footprint
  • Increased performance
  • Reduced time required for replication across all environments
  • SQL Server 2008 R2
  • SQL Server 2012

OEM IT, a subdivision of Microsoft IT, builds and manages 40 line-of-business (LOB) applications, with quarterly software release cycles that support more than $19 billion USD in revenue created by the Microsoft original equipment manufacturer (OEM) and the System Builder channel business.

The LOB applications that OEM IT supports perform functions such as certifying, tracking, and shipping the certification of authenticity licensing to Microsoft's OEM partners. Because OEM IT systems support a large percentage of Microsoft's revenue generation, the business applications they support are considered system critical. OEM IT had roughly 5 terabytes (TB) of critical data in its production databases that is also propagated to 14 preproduction environments that have to be refreshed monthly.

Aside from the costs of storing 63 TB of data, the monthly refresh to the preproduction environments had become time consuming, often requiring about 24 hours to copy the data. OEM IT had a shared commitment with Microsoft IT to reduce its data storage footprint by 24 percent to save on the cost associated with purchasing the new disk space that would be required for expansion. By substantially reducing the size of the production databases, OEM IT could greatly affect the overall data footprint across all of the preproduction environments, both saving money and reducing the amount of time required to refresh the data in those environments.

OEM IT looked to the compression features available in Microsoft® SQL Server® 2008 R2 to help manage its ever-increasing amount of data, control storage costs, and improve its LOB application performance. After implementing compression, OEM IT experience a 33 percent reduction it its data storage footprint. The reduced data footprint also increased performance and overall server health.

"Data Compression has been an obvious win for OEM IT. This represents a huge savings in server storage, hardware requirements, and time. It has made our preproduction refresh process much more efficient while helping improve the performance of all our environments."
—Mike Adams
OEM Principal Development Lead
Microsoft Corporation

Solution

OEM IT reviewed the available SQL Server 2008 R2 product documentation and best practice guidance about data compression to help develop its data compression strategy. The team customized the guidance available in the documentation to the OEM IT environment by identifying the best practices that applied to their environment, augmenting them with additional automation.

Note: For more information about creating a data compression strategy, see Data Compression: Strategy, Capacity Planning and Best Practices.

"We were able to see a lot of benefits from data compression, because we took the time to understand the workload of our systems. Understanding workloads and the CPU utilization of our SQL Server instances helped us select the correct data compression type, and thorough planning helped us identify which tables and indexes needed to be compressed and estimate current and future capacity. Another key activity that contributed to the success of this effort was testing in the preproduction environment; it was instrumental to understand the impact of compression before we applied it in the production environment."
Lisa Specchio
Database Administrator, OEM IT

and

Shishir Abhyanker
Database Lead, OEM IT

Assessing the Environment

Data compression, which includes rebuilding selected partitions, tables, and indexes with compression, can help reduce the size of a database as well as improve the performance of I/O-intensive workloads. Because compressed data is stored in fewer pages, queries need to read fewer pages from the disk, thereby improving the performance of I/O-intensive workloads. However, extra CPU resources are required on the database server to compress and decompress the data while data is exchanged with the application. Therefore, it is important to understand the workload characteristics when deciding which tables and indexes to compress. The servers in OEM IT were I/O bound and had plenty of CPU resources available to handle the minor increase in processor load that compression caused.

Identifying What to Compress

The OEM IT team decided to focus their initial compression efforts on the top six databases in their environment. The six databases were chosen based on a combination of criteria, including:

  • How much of the overall data footprint percentage they represented. OEM IT considered the databases with the largest percentage of the total data footprint. In doing so, the team needed to compress only six of their more than 30 databases to achieve their storage reduction goals.

  • The performance of the applications the database supported. OEM IT looked at databases that were not performing well to determine whether compression would improve their performance.

  • The type of data that was contained in the databases. One of the databases selected for compression was chosen because it had bar chart data, making it particularly suitable for compression. It was a large database, but the tables contained a lot of empty space, so it was compressed by more than 50 percent. Tables that contain the following patterns of data compress particularly well:

    • Columns with numeric or fixed-length character data types, where most values do not require all the allocated bytes�for example, integers where most values are less than 1000

    • Nullable columns, where a significant number of the rows have a NULL value for the column

    • Significant amounts of repeating data values or repeating prefix values in the data

Estimating Compression Space Savings

The OEM IT team developed scripts to scan the six selected databases and identify the top 30 tables and indexes for compression. In addition to the top 30 objects from the reporting database, the database administrator specified that objects in the selected database that are replicated but not necessarily in the top 30 should also be compressed.

The OEM IT team used the SQL Server 2008 stored procedure to help them estimate the level of compression for row-level compression and for page-level compression for their tables and indexes, and then applied statistics to it for a match formula.

Transact-SQL Code for Estimating Free Space in Database Files



SELECT

a.file_id,

LOGICAL_NAME = a.name,

PHYSICAL_FILENAME =
a.physical_name,

FILEGROUP_NAME = b.name,

FILE_SIZE_MB = CONVERT(DECIMAL(12,2),ROUND(a.size/128.000,2)),

SPACE_USED_MB =
CONVERT(DECIMAL(12,2),ROUND(FILEPROPERTY(a.name,'SpaceUsed')/128.000,2)),

FREE_SPACE_MB =
CONVERT(DECIMAL(12,2),ROUND((a.size-FILEPROPERTY(a.name,'SpaceUsed'))/128.000,2))

FROM sys.database_files a LEFT
OUTER JOIN sys.data_spaces b

ON a.data_space_id =
b.data_space_id

After the team ran the stored procedure, they applied mathematical formulas to help them estimate which tables and indexes would provide the highest-level percentage of compression. OEM IT also built specific queries to identify whether to apply page or row compression for tables or indexes. The team assessed several row and page compression options:

  • Row-compress some tables, page-compress others, and do not compress the rest.

  • Page-compress a heap or clustered index, but have no compression on its non-clustered indexes.

  • Row-compress one index, and have no compression on another index.

  • Row-compress some partitions of a table, page-compress others, and do not compress the rest.

The OEM IT team, based on their workload analysis and data patterns, chose to use page compression for the tables and indexes that they selected for compression (Table 1).

Table 1. Compression Estimations for Five of the Selected Databases

Database

Size

Estimated size

% Reduction

OEM OLTP

620,500 MB

346,100 MB

44.22%

OEM Adhoc Reporting DB

855,500 MB

581,600 MB

32.02%

OEM Reporting DB

140,100 MB

82,600 MB

41.04%

Data Warehouse

1,021,800 MB

388,000 MB

62.03%

OEM Staging DB

429,000 MB

133,500 MB

68.89%

Understanding Workloads

Tables and indexes are compressed using the ALTER TABLE . . . REBUILD and ALTER INDEX . . . REBUILD statements, respectively. Compressing a table or an index requires workspace, CPU, and I/O and uses the same mechanism as rebuilding an index. One of the main planning activities that OEM IT undertook was around understanding its workloads. The development team worked on proofs of concept to determine what the performance impact would be during databases compression.

Workspace

The OEM IT team knew that when they actually compressed the tables and indexes, it would require a bit more space during compression and that they needed to plan for increases in the workspaces of:

  • User database

  • Transaction log

  • tempdb

OEM IT estimated the workspace requirements before starting the compression to ensure that there was enough free space to avoid potentially expensive auto-grow of the database files or failed compression resulting from lack of disk space. The developers created a small set of scripts that helped them determine how much storage needed to be added before starting compression.

I/O

I/O is generally proportional to the workspace used. OEM IT knew that its systems were I/O bound.

CPU

Page compression takes 2–5 times the CPU time used for rebuilding an index.

Applying Compression in Offline or Online Mode

OEM IT also had to consider whether it was applying compression in offline or online mode. Offline operations are faster but do not allow other process to run during compression. Online operations require about twice as much CPU as offline operations. For example, one of the databases selected for compression had an increase in CPU of 16 percent during compression in the test environment. OEM IT was not concerned by this, because its systems had a surplus of CPU power that was not being used. Because CPU was not a constraint, OEM IT applied compression in Online-Full Recovery mode.

The tables and indexes were compressed during a window of time during which the system was offline for other maintenance, so there was nothing else running during the compression. Because compression was performed in online mode, OEM IT did not need to rebuild its indexes, which saved additional time and effort.

Note: For a summary of workspace, CPU, and I/O requirements for compressing a clustered index as compared to rebuilding the same uncompressed index, see Data Compression: Strategy, Capacity Planning and Best Practices.

Applying Compression with Parallelism

OEM IT applied compression in a preproduction test environment before actually applying it in the production environment. While applying compression in the test environment, the team discovered that the compression exercise was quite time consuming, taking about six hours to complete their largest database and about 12–18 hours to complete compression of all six databases.

OEM IT's compression strategy was initially sequential, meaning that the team took the sequential approach in compressing the primary database before beginning to compress any of the child or subscriber databases. They chose a sequential approach, because there was some concern for issues caused by compressing publishers and subscribers at the same time.

As OEM IT learned more about compression behavior, the team discovered the SQL parallelism feature and that the MAXDOP setting could be used to specify the number of processors to be used (up to eight) during a table rebuild with compression. Running in parallelism did not cause any issues and helped cut the compression time from 12–18 hours to about six hours.

Compressing the Production and Preproduction Environments

After testing compression in a preproduction test environment, OEM IT applied compression during a planned system downtime window in the production environment by rebuilding the tables selected in each of the six chosen databases with compression. Because the compression was performed in Online mode, SQL Server rebuilt the clustered index and compressed the underlying data.

To ensure that their system-critical applications did not lose any data if something should happen while they were compressing the objects in the production environments, the OEM IT team created a specific backup and recovery plan for during and after production database compression.

After the production database was compressed, the compressed data from the production database was refreshed to all of the preproduction environments. Because the production versions of those databases were now compressed, all of the corresponding databases in the other environments became compressed, as well.

Reclaiming the Space Released by Data Compression

After data compression was complete, the space saved was released to the respective data files. However, the space was not released to the file system, because the file size is not reduced automatically as part of data compression. OEM IT had a couple of options to release the space of file system by reducing the size of the files:

  • Option 1. Do nothing. If the free space is not reclaimed, it can be kept in the file group for future data growth.

  • Option 2. DBCC SHRINKFILE (or DBCC SHRINKDATABASE) is an option, but it is time consuming, and shrinking a database file severely fragments its contents.

  • Option 3. If while compressing all the tables in a file group a new file group can be created, the tables and indexes can be moved to the new file group while compressing.

OEM IT chose option 1, and all of the space released by compression was allocated for future growth. This strategy helped reduce OEM IT's need to request additional storage from the data center to keep up with its ongoing growth.

Compressing New Data

After compression has been applied to tables and indexes, moving forward, any new data inserted or updated in the table or index is also compressed. Compressing the new data helps OEM IT continue to meet the shared goal of having 24 percent of its data compressed.

New tables and indexes added to the six databases are assessed to determine whether they should be built with compression. Because of the space savings and the notable improvement in performance, there are ongoing efforts to apply compression on other databases in the production environment.

Lessons Learned

OEM IT learned from its implementation experience that it is important to assess the other technologies in the environment when applying compression. Preproduction and test environments do not always exactly match the production environment. Although the database objects being tested may be identical, the behaviors in how the production database interacts with other technologies while it is fulfilling its operational processes can be different.

Log Shipping and Mirroring

OEM IT did not have log shipping or mirroring in the test environment, and the technologies were overlooked during the planning phases. It was not until the team actually started compressing data in the production environment that they realized that they should have considered what compression was going to do to those technologies.

The issues with log shipping and mirroring were specific to the OEM IT environment and its processes, but they do illustrate the need to assess the potential effects on underlying technologies and processes while designing a compression strategy.

Mirroring

Mirroring was not able to catch up on some of the databases being compressed. When mirroring started to fall significantly behind, OEM IT got concerned and broke those mirrors. After the compression was finished, the team reestablished mirroring on the compressed databases.

The team made the decision because they had not tested compression with mirroring and were not exactly sure what was going to happen. They did later determine that because they were not running in high-performance mode, they could have let it go and the mirroring would have caught up.

Log Shipping

OEM IT's resiliency process is that the disaster recovery environment be three days behind the production environments so that if something happened during a code deployment weekend, the workload could fail over to the disaster recovery environment to get the systems up and running again with minimum effort and delay. When compression was running in the production environment, the log shipping files became too large to be copied to the secondary site and somewhat unmanageable. OEM IT did not have the room it needed to store the three days' worth of logs required as part of its disaster recovery and resiliency processes. During the initial compression, OEM IT broke log shipping for the databases being compressed. After the compression was complete, the team was able to reestablish log shipping on those compressed objects. It is not a best practice to break log shipping during compression, and OEM IT learned in this scenario that if it had planned in advance for the increased size of the logs, it could have adapted and added the extra storage required or changed its process during the compression downtime. But because the team had not planned for it, no mitigation plan was in place.

Benefits

By reducing the size of its database footprint using compression, the OEM IT group saved money on disk costs, reduced the need to expand the storage allocations, and saved time during environment refreshes.

OEM IT experienced an overall data footprint reduction of roughly a 33 percent—from 63 TB to 42 TB—across all of its environments. This reduction greatly exceeded the 24 percent reduction goal that the group shared with Microsoft IT and also resulted in a $76,000 USD annual storage cost savings. Aside from storage cost avoidance, the reclaimed storage space will provide greater growth capacity over time and has greatly improved performance.

There was a 100-GB reduction in the size of production database backups, with a 1.4-TB saving across all environments. Because the production data footprint was smaller, there were time savings during the monthly environment refresh and while copying backups from production to preproduction environments. That time saving is estimated to be roughly 60 hours annually.

Data Savings and Increased Performance

Before the initial compression of the databases in the production environment, the total size of the six selected databases was 3150 GB. After compression, the databases were reduced in size to 1555 GB total (see Table 2).

Table 2. Space Saving and Performance Increases

Database

Size before

Size after

Percent

Reduced

Performance
(Each I/O = 8 Kilobytes)

OEM OLTP

638 GB

333 GB

47.80%

Overall average 185 million less I/O based on top 10 I/O workload (1.4 TB)

Max I/O reduced by 981 million (7.8 TB)

OEM Adhoc Reporting DB

825 GB

550 GB

33.33%

Overall average 155 million less I/O based on top 10 I/O workload (1.2 TB)

Max I/O reduced by 300 million (2.4 TB)

SalesOut

70 GB

27 GB

61.42%

Overall average 233 million less I/O based on top 10 I/O workload (1.7 TB)

Max I/O reduced by 298 million (2.4 TB)

OEM Staging DB

432 GB

138 GB

68.05%

Data Warehouse

1045 GB

428 GB

59.04%

Overall average 75 million less I/O based on top 10 I/O workload (600 GB)

Max I/O reduced by 770 million (6.1 TB)

ODS DB

140 GB

79 GB

43.57%

Overall average 2 million less I/O based on top 10 I/O workload (160 GB)

Max I/O increased by 16 million (128 GB)

Improved Performance

Performance in some cases was increased by 200 percent after compression. I/O was also improved by compression, and some of the stored procedures have improved significantly. The graphs in Figure 1 and Figure 2 represent some of the performance metrics that OEM IT tracked before and after compression.

Figure 1. I/O Performance before and after compression

Figure 1. I/O Performance before and after compression

Disk read/write times were substantially improved. OEM IT systems perform a lot of read/write activities, and because of data compression, a smaller volume of data is read from or written to disk. Because physical I/O is expensive from a workload perspective, reduced physical I/O often results in a bigger saving than the additional CPU cost to compress and decompress data.

Figure 2. Average CPU in seconds before and after compression

Figure 2. Average CPU in seconds before and after compression

Reduced Refresh and Backup Times

"While the reduction in physical storage is providing both cost and capacity surplus to the OEM IT Engineering team, by far the largest benefit is the reduction in time that it takes to move the data between pre-production environments, load the data, and prepare the environment for either development or testing needs. Now, that much more time can be used by the engineering team to drive more features and quality into releases."

Feliks Shostak
IT Director
Microsoft Corporation

For data refreshes, there are three major steps: backup, copy, and restore. The copy is what was going out to the multiple environments across data centers and regional locations, and this process was taking as long as 24 hours to finish, because the file sizes were so large. With compression, the copy time has been reduced by almost half.

For database backups, backup compression was already being used, so when that was combined with database compression, the files sizes got that much smaller. Because of the reduce file size, backup and restore times are now shorter.

Conclusion

By looking to the compression features available in SQL Server 2008 R2, OEM IT was able to control database size growth, control costs, and improve its critical LOB application performance. Since implementing compression, OEM IT has experience a 33 percent reduction in its data storage footprint, which has in turn increased performance and overall server health.

For More Information

For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Order Centre at (800) 933-4750. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information via the World Wide Web, go to:

http://www.microsoft.com

http://www.microsoft.com/technet/itshowcase

© 2012 Microsoft Corporation. All rights reserved.

This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft and SQL Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft