Click to Rate and Give Feedback
TechNet
TechNet Library
Single Instance Storage in Microsoft Windows Storage Server 2003 R2

A Solution for Managing Duplicate Files

Technical White Paper

Published: May 2006
Updated: February 12, 2008

Download

Download Technical White Paper, 299 KB, Microsoft Word file

Situation

Solution

Benefits

Products & Technologies

Organizations face soaring demand for storage. The inefficiency of storing duplicate files exacerbates the need for storage space. Microsoft saw this problem across its organization, from data-center servers to branch offices.

Microsoft uses the Single Instance Storage (SIS) feature of Microsoft Windows Storage Server 2003 R2 to automatically identify duplicate files and manage them to reduce storage needs. The company began with deploying SIS on servers that store copies of Microsoft products. The company is expanding use of SIS to file servers.

  • Up to 40 percent reduction of storage needs on deployments so far at Microsoft
  • 6.8 terabytes of storage space reclaimed so far in Microsoft deployment of SIS across more than 274 servers
  • Reduced main memory cache loads through reduction of duplicate files

  • Microsoft Windows Storage Server 2003 R2
  • Single Instance Storage feature of Windows Storage Server 2003 R2
On This Page
Bb735246.arrow_px_down(en-us,TechNet.10).gif Executive Summary
Bb735246.arrow_px_down(en-us,TechNet.10).gif Introduction
Bb735246.arrow_px_down(en-us,TechNet.10).gif Value Proposition and Benefits
Bb735246.arrow_px_down(en-us,TechNet.10).gif Single Instance Storage Architecture
Bb735246.arrow_px_down(en-us,TechNet.10).gif Integration with Other Features
Bb735246.arrow_px_down(en-us,TechNet.10).gif Best Practices
Bb735246.arrow_px_down(en-us,TechNet.10).gif Conclusion

Executive Summary

This white paper describes the basic architecture of the Single Instance Storage (SIS) feature of the Microsoft® Windows® Storage Server 2003 R2 operating system. This feature helps organizations reduce storage needs for file servers by identifying duplicate files within hard disk volumes and providing an efficient mechanism for consolidating them. The paper also examines the benefits of using SIS, based upon deployment by the Microsoft Information Technology (Microsoft IT) group for Microsoft branch offices and data-center servers. The paper closes with a collection of best practices.

SIS works by searching a hard disk volume to identify duplicate files. When SIS finds identical files, it saves one copy of the file to a central repository, called the SIS Common Store, and replaces other copies with pointers to the stored versions.

The process is transparent to users. A user still sees a file name in his or her directory, and then clicks it to open the file. If the user alters the file, SIS saves this unique copy for the user and removes the pointer to the Common Store copy. Other users continue to read the unchanged copy from the Common Store.

Organizations benefit from the SIS technology in Windows Storage Server 2003 R2, because it helps to:

  • Reduce total data stored on a volume by consolidating duplicate files.
  • Reduce data cached in memory by consolidating duplicate files.
  • Reduce the data that is backed up by SIS-aware backup applications.

In its own internal deployments, Microsoft IT has found that SIS has the potential to reduce storage substantially. Microsoft IT has deployed SIS on more than 200 servers that host Microsoft products for downloading within the company. Many of these servers provide additional storage functions within the branch office setting. Microsoft IT reports that SIS has reduced storage on its file servers by 25 percent to 40 percent, depending upon the type of content stored.

Although stand-alone SIS technology is a new addition to Windows Storage Server, SIS has well-proven technology and is a key element of Microsoft Windows 2000 Remote Installation Services (RIS).

Though SIS is similar to the symbolic link feature implemented in UNIX and other operating systems, SIS differs from symbolic links in fundamental ways. SIS is a more robust solution in areas that include ease of deployment, administration, and transparency to users. In fact, SIS works automatically without any user involvement, in contrast to symbolic links, which the user must set up and maintain. SIS automatically determines when two or more files have the same content and links them together. SIS can do this linking even when the duplicate files have different names, because it bases the comparison on the actual file content.

SIS also includes an application programming interface (API) to assist developers in creating SIS-aware backup and restoration solutions, which can take advantage of the storage space reductions of SIS so that less data needs to be backed up—providing savings in terms of time, media, and offsite media storage costs.

Introduction

Electronic data has become one of the most important assets for businesses today. The need for storage has steadily grown to meet a number of needs, including the data retention requirements of new compliance requirements, deployment of ever more data-intensive applications, e-commerce systems, and the growing prevalence of multimedia content. Estimated storage requirements are growing at a rate of 60 to 100 percent a year. Though storage concerns used to be chiefly the problem of only enterprise-sized organizations, they are increasingly a burden to midsize and even relatively small businesses. All organizations need better solutions for provisioning and managing storage resources.

To address this problem, Microsoft introduced Windows Storage Server 2003 R2, which provides a dedicated file server optimized for storage workload based on the Microsoft Windows Server™ 2003 operating system. In addition to supporting Internet Small Computer System Interface (iSCSI), Fibre Channel gateway, and network attached storage (NAS) server functionality, one of the innovative ways that Windows Storage Server 2003 R2 addresses management of data growth is through the SIS feature. SIS recovers disk space by reducing the amount of redundant data stored on a volume by identifying identical files, storing only a single copy of the file in the SIS Common Store, and replacing the duplicate files with links to the single copy in the SIS Common Store.

SIS can significantly reduce file server loads by identifying and consolidating duplicate files. In 1996, Microsoft deployed an earlier version of the architecture on Microsoft Exchange Server version 4.0 to lessen the problem of thousands of users being sent identical e-mail attachments. Microsoft built upon this concept, adding among other things a more robust solution for detecting duplicate files, thereby creating SIS. Microsoft first deployed SIS as a key component of RIS for installing the Windows 2000 operating system on remote startup–enabled computers. The inclusion of SIS in Windows Storage Server 2003 R2 greatly increases the scope of the technology to cover the entire file-serving workload.

This white paper describes the value proposition and benefits of SIS, and it describes the basic architecture of the solution.

Value Proposition and Benefits

Deploying an efficient system for managing duplicate files provides a number of benefits for organizations that are seeing steep growth in the demand for file storage. Organizations of all sizes are seeing escalating storage needs—both at centralized data centers and at branch offices.

With about 75 percent of midsize organizations in the United States having an average of six branch office locations, there is an ever greater need for better manageability and efficiency in meeting storage demands—especially because most small to midsize organizations have limited IT staff.

Even when files are complete duplicates, they are separate from one another in that they may have different path names, owners, and access control lists (ACLs), and they may charge different users' disk allocation quotas. However, the fact that the files have identical contents presents an opportunity to save space on the disk and reduce the footprint of file caching within memory through use of the SIS feature of Windows Storage Server 2003 R2.

SIS provides a range of benefits, including:

  • Reduced disk space. SIS reduces disk space consumption by consolidating duplicate files.
  • Reduced file caching in memory. SIS reduces caching loads by storing just a single copy of a duplicate file per volume, regardless of how many users are concurrently accessing the file.
  • Reduced backup time. By reducing the number of duplicate files per volume, SIS can significantly reduce the total data backed up when an organization uses an SIS-aware backup and restoration application.
  • Ease of administration. SIS does not require daily maintenance.
  • Ease of use. SIS is transparent to end users and applications.
  • Backup and restoration. The SIS Backup API allows backup applications that use the API to determine whether the file is part of the SIS Common Store and back up a single copy of the file.

Walking Through SIS Operations

Walking through the basic SIS operations shows the value of the solution. Consider the following scenario:

Fifty users receive the same e-mail message with an attachment. They all save the attachment to their home folders located on the same file server volume. An SIS service called Groveler runs in the background, detecting the 50 identical files on the volume, moving one of the copies into the SIS Common Store, and replacing the other 49 files with a link to the file in the SIS Common Store.

One of the users makes a change and saves the file. During the save operation of the updated user file, SIS removes the link in the user's home folder. This process is completely transparent to the application and user.

The remaining users continue to access the single file in the Common Store. As each user modifies his or her file, SIS drops the pointer to the Common Store copy and gives the user his or her own copy.

Benefiting from Read-Only Usage

SIS provides its greatest value when files are mostly used on a read-only basis, because the less files are written to, the greater the chance that there are duplicate files on a server. Fortunately, studies of how users interact with files show that most files are used on a read-only basis, meaning that access to the file system would continue to be served by pointers to a single copy of each file in the SIS Common Store, providing long-term benefits in reducing total storage.

Reducing Storage

Microsoft has more than 250 branch offices worldwide. To reduce wide area network (WAN) traffic, Microsoft deploys product servers at many of its branch offices so that users can download Microsoft applications locally. After deploying Windows Storage Server 2003 R2 at branch offices, the company found that SIS reduced storage by about 40 percent on its product installation share servers. Product servers proved to be good candidates for SIS because they contain multiple product installation folders with very similar contents, including system files and dynamic-link library (DLL) files. Microsoft IT has found storage savings in the range of 25 percent for some of its file server deployments within its data centers. Space savings are dependent upon the degree to which files are duplicated per volume. A database, for example, would likely be a poor candidate for SIS because so much of the data is unique.

The following table summarizes the resulting space savings using SIS on servers used to deploy applications and software products within Microsoft..

Table 1. Space Savings Results from SIS

Server Type Average Space Savings % Average Space Savings (GB) # of Servers sampled Actual # of Servers Total Space Savings (GB)

Client Software Install Shares –Hub

33%

67.64

22

34

2299.76

Client Software Install Shares –Branch Office

24%

16.57

70

111

1839.27

Server Software Install Shares

48%

47.28

21

34

1607.52

International Version Product Shares

42%

214.8

2

2

859

Archived Products

63%

545

2

2

1090

Remote Installation Services

40%

3.05

52

91

277.55

Total

54% *

169

274

7973.1

*Weighted by average space savings

Reducing Main Memory Cache Footprint

Even if disk storage prices continue to decline on a per-byte basis, it is advantageous for organizations to reduce total storage because it is less expensive to manage and back up smaller data stores. An additional benefit is positive performance effects that can come from reducing the size of file caching in memory. The benefits of reducing main memory cache loads will likely become more pronounced as the ratio of processor, memory, and network speeds to disk latency increases.

Single Instance Storage Architecture

The SIS architecture includes basic components that integrate smoothly to provide SIS functionality. This section begins with an overview of the components and then examines each more closely. The basic SIS components and features include:

  • SIS Groveler. The SIS Groveler searches for files that are identical on the NTFS file system volume. It then reports those files to the SIS filter driver.
  • SIS Storage Filter. The SIS Storage Filter is a file system filter that manages the duplicate copies of files on logical volumes. This filter copies one instance of the duplicate file into the Common Store. The duplicate copies are replaced with a link to the Common Store to improve disk space utilization.
  • SIS Link. SIS links are essentially placeholders or pointers within the file system, maintaining both application and user experience (including attributes such as file size and directory path) while I/O is transparently redirected to the actual duplicate file located within the SIS Common Store.
  • SIS Common Store. The SIS Common Store serves as the repository for each file identified as having duplicates. Each SIS-maintained volume contains one SIS Common Store, which contains all of the merged duplicate files that exist on that volume.
  • SIS Administrative Interface. The SIS Administrative Interface gives network administrators easy access to all SIS controls to simplify management.
  • SIS Backup API. The SIS Backup API (Sisbkup.dll) helps OEMs create SIS-aware backup and restoration solutions.

Groveler Architecture

The SIS Groveler (so called because it grovels through the contents of the file system) is a user-level service that automatically finds identical files and reports them to the SIS Storage Filter for merging. Groveler also tracks changes to the file system.

Though Groveler is designed to do most of its work when the operating system is not busy (in background mode), it is possible to run Groveler at maximum capacity (in foreground mode) by using the Sisadmin.exe tool. After Groveler completes its work in foreground mode, it resumes normal operation in background mode.

Note: Groveler will attach to only NTFS volumes that are mounted with a drive letter. There is no support for SIS linking volumes that are mounted as mount points.

Checking for Duplicates and Monitoring Updates

Groveler works by computing the hashed signature of a file, and then comparing the signature with others that it keeps in a database to determine whether they are identical. It then reports matching files to the SIS Storage Filter so that a copy can be sent to the SIS Common Store and links can be created to replace the additional copies.

Groveler uses the NTFS Update Journal feature, which maintains a record of all recent updates to a volume. Each entry in the journal has an update sequence number (USN), ensuring that changes in files are not missed.

By monitoring the Update Journal USN entries, Groveler can detect whether a file update has not yet been processed. Groveler then updates the signatures for files that have been added or modified.

If the USN journal encounters a recoverable error, such as a journal wrap, SIS will rescan the volume and update file signatures as necessary.

Using a 128-bit Signature

Groveler uses a 128-bit file signature. The first 64 bits of the signature provide the size of the file. It is inexpensive to obtain the file size, and files that have differing sizes obviously cannot be identical. Groveler computes the remaining 64 bits by running a hash function on a fixed portion of the file's contents. Groveler hashes two 4-kilobyte (KB) chunks of file contents from the middle of the file (unless the file is less than or equal to 8 KB in size, in which case Groveler hashes the entire file). If two files appear identical, Groveler performs a full binary comparison.

Storage Filter Architecture

The SIS Storage Filter is a kernel-level file system filter driver that manages the duplicate copies of files on hard disk volumes, as identified by the SIS Groveler. The SIS Storage Filter copies one instance of the duplicate file to the SIS Common Store, and replaces the files with a link to the SIS Common Store copy to improve disk usage.

The SIS Storage Filter helps ensure that users see appropriate behavior when they access SIS links. When an application tries to open the original file, the Storage Filter redirects any file input or output to the SIS file in the SIS Common Store directory. The filter driver is in all input/output (I/O) paths of volumes to which it is attached, and handles the normal file operations that happen on SIS links, such as read, write, open, close, and delete.

Creating SIS Links

The SIS Storage Filter creates SIS links. SIS links do not contain any file data, but just contain a pointer to the Common Store file. Common Store files are located in a protected directory. Because the data for SIS files is located in the Common Store rather than in any particular link file, SIS avoids the problems that would arise when a linked file is deleted or overwritten.

If the source file is not already an SIS link, its contents are copied to a newly created file in the Common Store, and the source file is converted into a link to that Common Store file. The destination file is then also updated as a link to the Common Store file (either pre-existing or newly created). SIS keeps some out of band information (called backpointers) associated with the Common Store file that contains the set of links that point to the Common Store file. A COPYFILE request adds such a backpointer for the destination, and also for the source if it was not already an SIS link. (Applications may use COPYFILE to create an SIS link. If the file is already a merged duplicate, only a link is created. If the file exists elsewhere as a single copy, SIS merges it as well.)

Using Copy-on-Close

The SIS Storage Filter handles reads by redirecting them to Common Store files, and it handles writes by using a copy-on-close technique. The copy-on-close technique for SIS differs from copy-on-write in that the copy is delayed beyond even the time of the first write until the complete set of updates are made to the file, and then only the portions of the file that have not been overwritten are copied from the Common Store. Copy-on-close has two advantages over copy-on-write—there is no delay for the copy at the time of the first write, and there is no need to copy the portions of the file that are overwritten.

The cost of making SIS copies of files that are already SIS links is small and independent of the size of the file. The disk-space overhead of a SIS link is about 300 bytes regardless of the size of the file to which the link refers.

Using SIS Links and Reparse Points

An SIS link is implemented as a sparse file of the size of the file it represents with no regions of data actually allocated on disk. Because there are no regions allocated, the file uses only as much space as is needed for its link. An SIS link has a reparse point with an SIS tag. The contents of the data portion of an SIS reparse point include:

  • Name of a Common Store file that backs the contents of the link
  • Unique identifier for the link
  • Signature of the contents of the Common Store file backing the link
  • Internal bookkeeping information

Breaking SIS Links

When a file that SIS has consolidated is modified, or its contents are replaced, the reparse point is removed and the user stores the modified copy. Other users, who have not modified their versions of the document, continue to be served by reparse points and the original copy in the SIS Common Store.

A change of time stamp or ACLs will not break a file's SIS links; SIS concerns itself with only the data portion of a file. SIS does not touch any metadata, so any metadata changes (other than file size) will not break a file's SIS links. Each SIS linked file instance therefore still contains its correct time stamps, ACLs, and other attributes. One exception would be the case in which a user tried to attach an extended attribute to an SIS controlled file. In that case, the file would become the user's custom copy and would no longer be accessed from the SIS Common Store. This is because NTFS does not support extended attributes and reparse points on the same file at the same time.

Note: The SIS Storage Filter cannot be stopped. If this service is disabled, users will not be able to access the linked files. If the Common Store is deleted, a loss of data will result for all linked files.

Common Store Architecture

All back-end files that SIS maintains are called Common Store files. One Common Store exists on each SIS-maintained volume and contains all of the Common Store files that exist on that volume. This directory is located in the root directory of the volume and is called \SIS Common Store. The Common Store is implemented as a directory that is by default restricted to allow access only to the system account. Only the SIS filter and backup applications need access to this directory.

The properties of a Common Store file are the following:

  • A Common Store file may have one or more links pointing to it.
  • After a Common Store file is created, its contents never change.
  • The names of Common Store files are globally unique—that is, they are unique across all volumes across all systems in the world, and the binding between a Common Store file name and its data is globally static.

When an SIS link is eliminated, either by deletion of the link or because of an overwrite, the Storage Filter removes the corresponding backpointer in the Common Store file. When all of the backpointers for a Common Store file are removed, the filter deletes the Common Store file. All of this activity occurs on a system level, and users cannot directly access the Common Store.

Differences from Symbolic Link

SIS is similar to the symbolic link feature implemented in UNIX and other operating systems. However, SIS differs from symbolic links in the following three fundamental ways that greatly enhance the overall value and usability of the solution:

  • If two or more users each have their own duplicate copies of the same file on a volume managed by SIS and someone modifies one of the files, the users of the other files do not see the changes, because SIS works as an automatic copy-on-close link. The two files are linked only as long as they are identical. In contrast, with symbolic links, changes made through one of the links change the content of all links to the file.
  • The Common Store, the underlying shared disk repository that supports SIS links, is maintained by the system, and files within it are deleted only if all the SIS links that point to it are deleted. In contrast, symbolic links can break if a user deletes the target file.
  • SIS works automatically without any user involvement, in contrast to symbolic links, which the user must set up and maintain. SIS automatically determines that two or more files have the same content and places a copy in the Common Store.

Sisadmin.exe Command Line

Sisadmin.exe gives administrators command-line functionality for deploying, configuring, and managing SIS, including activating the SIS Filter and Groveler services. Sisadmin is used for switching Groveler between background and foreground use. It also displays information about SIS files and manages error messages.

For more information about how the SIS architecture is implemented in Windows Storage Server, refer to the Windows Server 2003 Platform SDK at http://www.microsoft.com/msdownload/platformsdk/sdkupdate/psdk-full.htm. For more information about SIS architecture, refer to the Microsoft white paper Single Instance Storage in Windows 2000 at http://research.microsoft.com/sn/Farsite/WSS2000.pdf.

SIS Backup API

The SIS Backup API provides the interface for OEMs to use in creating SIS-aware backup and restoration solutions that can take advantage of the space savings that SIS enables.

Integration with Other Features

SIS integrates with other features, including backup and restoration applications and Windows Clustering.

Backup and Restoration Support

Organizations should benefit because OEMs can create SIS-aware backup applications by using the SIS Backup API (Sisbkup.dll). Because SIS replaces identical files with reparse points, backup programs must understand this capability of the NTFS file system in order to efficiently back up and restore these files. SIS includes the Sisbkup.dll module to support backup and restoration applications.

The advantages of using SIS and the SIS backup architecture include:

  • The SIS architecture automatically maintains the connections between the SIS links and the backing files as the backup application calls the SIS Backup API functions.
  • Because SIS is implemented as a filter driver for a file system, it constantly tracks the connections between the SIS links and the back-end files. When the files are backed up and restored, the SIS Backup API ensures that only one instance of the back-end file will be backed up and restored, regardless of the number of SIS links that point to it.

If a backup program is not aware of SIS files, it makes duplicate copies of the SIS files in the actual backup data file. When restored, these copies create normal files instead of the reparse points. The SIS Groveler eventually combines these files, but possibly not in time to prevent the restoration operation from running out of disk space.

Backing Up SIS Links

The backup application calls Sisbkup.dll when it encounters an SIS link, and the DLL determines whether the backup needs to back up a Common Store file in response.

For restoring an SIS link, the restoration application calls the DLL, which in turn determines whether the appropriate Common Store file already exists or whether it has already reported that file to restore. If not, the restoration application reports the Common Store file that corresponds to the link being restored. Because Common Store files have universally unique file names, and their content never changes after they are created, if the Common Store file still exists on the volume, there is no need to restore over it; simply linking to it suffices.

Restoring SIS Files

SIS links are implemented as sparse files and reparse points. The structure and contents of a reparse point are opaque to backup and restoration applications. When restoring an SIS link, a restoration application should perform the following steps:

  1. Determine the Common Store file or files to which the SIS link points (by calling Sisbkup.dll).
  2. If the file or files do not exist in the Common Store, restore the file or files along with the SIS link.
  3. If the SIS link points to a Common Store file or files that exist on the disk, restore only the SIS link because the data in Common Store files never changes. So if a given Common Store file is still on the disk at restore time, it has the same contents as when it was backed up, and there is no need to overwrite it.

The only additional overhead required for SIS-assisted backups is that the backup application must back up the SIS link and the data associated with the backing files.

Note: All SIS backup and restoration operations are local to a specific volume.

Backup and restoration of encrypted files is enabled by the Raw Encryption API, which reads and writes encrypted files while keeping the data in encrypted format. The API enables the encrypted data in these files to be backed up and restored. The API also meets the goals of maintaining the security of the backed-up data.

The Windows Server 2003 Platform SDK provides more information about processing requirements for backup and restoration applications. To download this SDK, go to http://www.microsoft.com/msdownload/platformsdk/sdkupdate/psdk-full.htm.

Microsoft Clustering Support for SIS

SIS can be used with clustering, as long as all cluster node volumes are running the Single Instance Storage service. If the receiving node is running the SIS Storage Filter driver, the files can be accessed. If the receiving node does not have the SIS Storage Filter, the files cannot be accessed.

To most efficiently back up or restore SIS files, the receiving node requires the SIS Backup API (sisbkup.dll), which is installed by default. The backup application needs to use the API to realize the media storage saving that SIS provides, otherwise, the receiving server would back up the entire file system as duplicates.

Note: When SIS is enabled on Windows Storage Server 2003 R2 Service Pack 2 (SP2)-based server cluster or a Windows Unified Data Storage Server 2003-based server cluster with one or more disk resources, the cluster resource group may not be able to fail over between the cluster nodes. The server administrator must restart the cluster node where the failure occurs to allow for the failover to continue. Visit http://support.microsoft.com/kb/947266 for more information.

Best Practices

This section includes best practices and related observations compiled from network administrators who have deployed SIS at Microsoft.

The best practices for deployment activities include the following:

  • SIS can be deployed only on local NTFS volumes.
  • SIS will actively monitor and consolidate up to six volumes. Servers with a large number of volumes should configure SIS accordingly and apply SIS to volumes that have the best potential for compression (according to the number of duplicate files).
  • SIS cannot be used on the system or boot volume or on remote drives.
  • When possible, administrators should co-locate similar content on the same volume, because SIS will not merge files across separate volumes. This practice includes file shares within the same group or department, My Documents folders, application content and media, and application installation shares.
  • SIS merges files greater than or equal to 32 KB. An organization can benefit considerably from using SIS to monitor volumes that house files much larger than 32 KB. Generally, the larger the average file size, the greater the benefit realized. Deploying SIS on a volume where files average 32 KB or less will provide the least benefit.
  • In general, SIS should not be used on volumes that are experiencing high levels of I/O, because SIS merge and unmerge operations incur additional write I/O.
  • Because SIS runs as a background service and uses an efficient algorithm for identifying duplicate files, SIS impact to CPU performance is negligible in most scenarios.
  • If a volume is compressed through SIS, manual intervention will be needed to remove SIS.
  • Administrators should be aware of possible interactions with other storage filter drivers, such as quota filters, and test platforms accordingly prior to deployment.
  • Using SIS-aware backup applications will decrease space requirements for tape and other media, in addition to decreasing network utilization during backups.
  • SIS can be used to consolidate duplicate files that have distinct file level permissions. Each duplicate file will transparently retain its own distinct security settings after SIS merges it.
  • SIS currently is available as a stand-alone service only within Windows Storage Server 2003 R2. Administrators who want to benefit from SIS technology should contact their Microsoft solution providers for details.

The best practices for postdeployment activities include the following:

  • An administrator should never disable the SIS Storage Filter driver unless he or she is removing SIS from a volume. The SIS Storage Filter driver is required for accessing files in the SIS Common Store.
  • If the SIS Storage Filter is inadvertently disabled, an administrator can enable the Storage Filter by using a 0x0 startup type to re-establish access to the SIS volume. For more information about using a 0x0 startup type with SIS, refer to the article "Overview of memory dump file options for Windows Server 2003, Windows XP, and Windows 2000" at http://support.microsoft.com/?kbid=254649.
  • An administrator should never delete the SIS Common Store folder unless he or she is removing SIS from all volumes on a system. The Common Store holds the only copies of SIS-identified duplicate files, and removing the Common Store would cause such files to be lost.
  • In the event of a failover to another cluster node, the SIS Groveler service will need to be restarted because Groveler does not automatically detect new volumes, including volumes that fail over. The network administrator will need to manually recycle Groveler or use a scheduling tool to recycle Groveler periodically.

Conclusion

The SIS feature of Microsoft Windows Storage Server 2003 R2 provides an important resource for organizations that are searching for ways to reduce demands for storage resources. Internal deployments on more than 200 servers at Microsoft found an average storage reduction of 25 to 40 percent, enabling the company to reduce storage by 14.5 terabytes. By replacing duplicate files with links that point to a single copy in the SIS Common Store, organizations also benefit by reducing main memory cache loads, thereby reducing overhead and server resource requirements.

An organization can use the SIS Backup API to create SIS-aware backup and restoration applications, which can take advantage of the storage savings of SIS—meaning that the organization can use smaller backup windows and less backup media. An organization can also deploy SIS as part of a Windows Clustering cluster, enhancing data availability.

SIS is easy for organizations to take advantage of because it is completely transparent to users, who continue to interact with files as if they were not stored elsewhere. The efficiency of SIS is underscored by the fact that about 79 percent of files are used in a read-only fashion. At the same time, SIS gracefully handles user interaction with files, handling changes by removing an altered copy from the Common Store and simply storing it as the nonduplicate file it has become.

For More Information

For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information through the World Wide Web, go to:

http://www.microsoft.com

To see other Microsoft IT Showcase white papers, go to:

http://www.microsoft.com/technet/itshowcase

To visit the Windows Storage Server 2003 home page, go to:

www.microsoft.com/windowsserversystem/wss2003/default.mspx

To download the Windows Server 2003 Platform SDK, go to:

http://www.microsoft.com/msdownload/platformsdk/sdkupdate/psdk-full.htm.

To find more information on SIS architecture, go to:

http://research.microsoft.com/sn/Farsite/WSS2000.pdf.

© 2009 Microsoft Corporation. All rights reserved. Terms of Use | Trademarks | Privacy Statement
Page view tracker