Single Instance Storage in Microsoft Windows Storage Server 2003
R2
A Solution for Managing Duplicate Files
Technical White Paper
Published: May 2006
Updated: February 12, 2008
|
Situation
|
Solution
|
Benefits
|
Products & Technologies
|
|
Organizations face soaring demand for storage. The inefficiency of storing duplicate files exacerbates the need for storage space. Microsoft
saw this problem across its organization, from data-center servers to branch offices.
|
Microsoft uses the Single Instance Storage (SIS) feature of Microsoft Windows Storage Server 2003 R2 to automatically identify duplicate
files and manage them to reduce storage needs. The company began with deploying SIS on servers that store copies of Microsoft products. The
company is expanding use of SIS to file servers.
|
- Up to 40 percent reduction of storage needs on deployments so far at Microsoft
- 6.8 terabytes of storage space reclaimed so far in Microsoft deployment
of SIS across more than 274 servers
- Reduced main memory cache loads through reduction of duplicate files
|
- Microsoft Windows Storage Server 2003 R2
- Single Instance Storage feature of Windows Storage Server 2003 R2
|
On This Page
Executive Summary
This white paper describes the basic architecture of the
Single Instance Storage (SIS) feature of the Microsoft® Windows® Storage Server 2003
R2 operating system. This feature helps organizations reduce storage needs for file
servers by identifying duplicate files within hard disk volumes and providing an
efficient mechanism for consolidating them. The paper also examines the benefits
of using SIS, based upon deployment by the Microsoft Information Technology (Microsoft
IT) group for Microsoft branch offices and data-center servers. The paper closes
with a collection of best practices.
SIS works by searching a hard disk volume to identify duplicate files. When SIS
finds identical files, it saves one copy of the file to a central repository, called
the SIS Common Store, and replaces other copies with pointers to the stored versions.
The process is transparent to users. A user still sees a file name in his or her
directory, and then clicks it to open the file. If the user alters the file, SIS
saves this unique copy for the user and removes the pointer to the Common Store
copy. Other users continue to read the unchanged copy from the Common Store.
Organizations benefit from the SIS technology in Windows Storage Server 2003
R2, because it helps to:
- Reduce total data stored on a volume by consolidating duplicate files.
- Reduce data cached in memory by consolidating duplicate files.
- Reduce the data that is backed up by SIS-aware backup applications.
In its own internal deployments, Microsoft IT has found that SIS has the potential
to reduce storage substantially. Microsoft IT has deployed SIS on more than 200
servers that host Microsoft products for downloading within the company. Many of
these servers provide additional storage functions within the branch office setting.
Microsoft IT reports that SIS has reduced storage on its file servers by 25 percent
to 40 percent, depending upon the type of content stored.
Although stand-alone SIS technology is a new addition to Windows Storage Server,
SIS has well-proven technology and is a key element of Microsoft Windows 2000
Remote Installation Services (RIS).
Though SIS is similar to the symbolic link feature implemented in UNIX and other
operating systems, SIS differs from symbolic links in fundamental ways. SIS is a
more robust solution in areas that include ease of deployment, administration, and
transparency to users. In fact, SIS works automatically without any user involvement,
in contrast to symbolic links, which the user must set up and maintain. SIS automatically
determines when two or more files have the same content and links them together.
SIS can do this linking even when the duplicate files have different names, because
it bases the comparison on the actual file content.
SIS also includes an application programming interface (API) to assist developers
in creating SIS-aware backup and restoration solutions, which can take advantage
of the storage space reductions of SIS so that less data needs to be backed up—providing
savings in terms of time, media, and offsite media storage costs.
Introduction
Electronic data has become one of the most important assets for businesses today.
The need for storage has steadily grown to meet a number of needs, including the
data retention requirements of new compliance requirements, deployment of ever more
data-intensive applications, e-commerce systems, and the growing prevalence of multimedia
content. Estimated storage requirements are growing at a rate of 60 to 100 percent
a year. Though storage concerns used to be chiefly the problem of only enterprise-sized
organizations, they are increasingly a burden to midsize and even relatively small
businesses. All organizations need better solutions for provisioning and managing
storage resources.
To address this problem, Microsoft introduced Windows Storage Server 2003 R2,
which provides a dedicated file server optimized for storage workload based on the
Microsoft Windows Server™ 2003 operating system. In addition to supporting
Internet Small Computer System Interface (iSCSI), Fibre Channel gateway, and network
attached storage (NAS) server functionality, one of the innovative ways that Windows
Storage Server 2003 R2 addresses management of data growth is through the SIS
feature. SIS recovers disk space by reducing the amount of redundant data stored
on a volume by identifying identical files, storing only a single copy of the file
in the SIS Common Store, and replacing the duplicate files with links to the single
copy in the SIS Common Store.
SIS can significantly reduce file server loads by identifying and consolidating
duplicate files. In 1996, Microsoft deployed an earlier version of the architecture
on Microsoft Exchange Server version 4.0 to lessen the problem of thousands
of users being sent identical e-mail attachments. Microsoft built upon this concept,
adding among other things a more robust solution for detecting duplicate files,
thereby creating SIS. Microsoft first deployed SIS as a key component of RIS for
installing the Windows 2000 operating system on remote startup–enabled computers.
The inclusion of SIS in Windows Storage Server 2003 R2 greatly increases the
scope of the technology to cover the entire file-serving workload.
This white paper describes the value proposition and benefits of SIS, and it describes
the basic architecture of the solution.
Value Proposition and Benefits
Deploying an efficient system for managing duplicate files provides a number of
benefits for organizations that are seeing steep growth in the demand for file storage.
Organizations of all sizes are seeing escalating storage needs—both at centralized
data centers and at branch offices.
With about 75 percent of midsize organizations in the United States having an average
of six branch office locations, there is an ever greater need for better manageability
and efficiency in meeting storage demands—especially because most small to midsize
organizations have limited IT staff.
Even when files are complete duplicates, they are separate from one another in that
they may have different path names, owners, and access control lists (ACLs), and
they may charge different users' disk allocation quotas. However, the fact that
the files have identical contents presents an opportunity to save space on the disk
and reduce the footprint of file caching within memory through use of the SIS feature
of Windows Storage Server 2003 R2.
SIS provides a range of benefits, including:
- Reduced disk space. SIS
reduces disk space consumption by consolidating duplicate files.
- Reduced file caching in memory.
SIS reduces caching loads by storing just a single copy of a duplicate file per
volume, regardless of how many users are concurrently accessing the file.
- Reduced backup time. By
reducing the number of duplicate files per volume, SIS can significantly reduce
the total data backed up when an organization uses an SIS-aware backup and restoration
application.
- Ease of administration.
SIS does not require daily maintenance.
- Ease of use. SIS is transparent
to end users and applications.
- Backup and restoration.
The SIS Backup API allows backup applications that use the API to determine whether
the file is part of the SIS Common Store and back up a single copy of the file.
Walking Through SIS Operations
Walking through the basic SIS operations shows the value of the solution. Consider
the following scenario:
Fifty users receive the same e-mail message with an attachment. They all save the
attachment to their home folders located on the same file server volume. An SIS
service called Groveler runs in the background, detecting the 50 identical files
on the volume, moving one of the copies into the SIS Common Store, and replacing
the other 49 files with a link to the file in the SIS Common Store.
One of the users makes a change and saves the file. During the save operation of
the updated user file, SIS removes the link in the user's home folder. This process
is completely transparent to the application and user.
The remaining users continue to access the single file in the Common Store. As each
user modifies his or her file, SIS drops the pointer to the Common Store copy and
gives the user his or her own copy.
Benefiting from Read-Only Usage
SIS provides its greatest value when files are mostly used on a read-only basis,
because the less files are written to, the greater the chance that there are duplicate
files on a server. Fortunately, studies of how users interact with files show that
most files are used on a read-only basis, meaning that access to the file system
would continue to be served by pointers to a single copy of each file in the SIS
Common Store, providing long-term benefits in reducing total storage.
Reducing Storage
Microsoft has more than 250 branch offices worldwide. To reduce wide area network
(WAN) traffic, Microsoft deploys product servers at many of its branch offices so
that users can download Microsoft applications locally. After deploying Windows
Storage Server 2003 R2 at branch offices, the company found that SIS reduced
storage by about 40 percent on its product installation share servers. Product servers
proved to be good candidates for SIS because they contain multiple product installation
folders with very similar contents, including system files and dynamic-link library
(DLL) files. Microsoft IT has found storage savings in the range of 25 percent for
some of its file server deployments within its data centers. Space savings are dependent
upon the degree to which files are duplicated per volume. A database, for example,
would likely be a poor candidate for SIS because so much of the data is unique.
The following table summarizes the resulting space savings using SIS on servers
used to deploy applications and software products within Microsoft..
Table 1. Space Savings Results from SIS
|
Server Type
|
Average Space Savings %
|
Average Space Savings (GB)
|
# of Servers sampled
|
Actual # of Servers
|
Total Space Savings (GB)
|
|
Client Software Install Shares –Hub
|
33%
|
67.64
|
22
|
34
|
2299.76
|
|
Client Software Install Shares –Branch Office
|
24%
|
16.57
|
70
|
111
|
1839.27
|
|
Server Software Install Shares
|
48%
|
47.28
|
21
|
34
|
1607.52
|
|
International Version Product Shares
|
42%
|
214.8
|
2
|
2
|
859
|
|
Archived Products
|
63%
|
545
|
2
|
2
|
1090
|
|
Remote Installation Services
|
40%
|
3.05
|
52
|
91
|
277.55
|
|
Total
|
54% *
|
|
169
|
274
|
7973.1
|
*Weighted by average space savings
Reducing Main Memory Cache Footprint
Even if disk storage prices continue to decline on a per-byte basis, it is advantageous
for organizations to reduce total storage because it is less expensive to manage
and back up smaller data stores. An additional benefit is positive performance effects
that can come from reducing the size of file caching in memory. The benefits of
reducing main memory cache loads will likely become more pronounced as the ratio
of processor, memory, and network speeds to disk latency increases.
Single Instance Storage Architecture
The SIS architecture includes basic components that integrate smoothly to provide
SIS functionality. This section begins with an overview of the components and then
examines each more closely. The basic SIS components and features include:
- SIS Groveler. The SIS Groveler
searches for files that are identical on the NTFS file system volume. It then reports
those files to the SIS filter driver.
- SIS Storage Filter. The
SIS Storage Filter is a file system filter that manages the duplicate copies of
files on logical volumes. This filter copies one instance of the duplicate file
into the Common Store. The duplicate copies are replaced with a link to the Common
Store to improve disk space utilization.
- SIS Link. SIS links are
essentially placeholders or pointers within the file system, maintaining both application
and user experience (including attributes such as file size and directory path)
while I/O is transparently redirected to the actual duplicate file located within
the SIS Common Store.
- SIS Common Store. The SIS
Common Store serves as the repository for each file identified as having duplicates.
Each SIS-maintained volume contains one SIS Common Store, which contains all of
the merged duplicate files that exist on that volume.
- SIS Administrative Interface.
The SIS Administrative Interface gives network administrators easy access to all
SIS controls to simplify management.
- SIS Backup API. The SIS
Backup API (Sisbkup.dll) helps OEMs create SIS-aware backup and restoration solutions.
Groveler Architecture
The SIS Groveler (so called because it grovels through the contents of the file
system) is a user-level service that automatically finds identical files and reports
them to the SIS Storage Filter for merging. Groveler also tracks changes to the
file system.
Though Groveler is designed to do most of its work when the operating system is
not busy (in background mode), it is possible
to run Groveler at maximum capacity (in foreground
mode) by using the Sisadmin.exe tool. After Groveler completes its work in
foreground mode, it resumes normal operation in background mode.
Note: Groveler will attach to only NTFS volumes
that are mounted with a drive letter. There is no support for SIS linking volumes
that are mounted as mount points.
Checking for Duplicates and Monitoring Updates
Groveler works by computing the hashed signature of a file, and then comparing the
signature with others that it keeps in a database to determine whether they are
identical. It then reports matching files to the SIS Storage Filter so that a copy
can be sent to the SIS Common Store and links can be created to replace the additional
copies.
Groveler uses the NTFS Update Journal feature, which maintains a record of all recent
updates to a volume. Each entry in the journal has an update sequence number (USN),
ensuring that changes in files are not missed.
By monitoring the Update Journal USN entries, Groveler can detect whether a file
update has not yet been processed. Groveler then updates the signatures for files
that have been added or modified.
If the USN journal encounters a recoverable error, such as a journal wrap, SIS will
rescan the volume and update file signatures as necessary.
Using a 128-bit Signature
Groveler uses a 128-bit file signature. The first 64 bits of the signature provide
the size of the file. It is inexpensive to obtain the file size, and files that
have differing sizes obviously cannot be identical. Groveler computes the remaining
64 bits by running a hash function on a fixed portion of the file's contents. Groveler
hashes two 4-kilobyte (KB) chunks of file contents from the middle of the file (unless
the file is less than or equal to 8 KB in size, in which case Groveler hashes the
entire file). If two files appear identical, Groveler performs a full binary comparison.
Storage Filter Architecture
The SIS Storage Filter is a kernel-level file system filter driver that manages
the duplicate copies of files on hard disk volumes, as identified by the SIS Groveler.
The SIS Storage Filter copies one instance of the duplicate file to the SIS Common
Store, and replaces the files with a link to the SIS Common Store copy to improve
disk usage.
The SIS Storage Filter helps ensure that users see appropriate behavior when they
access SIS links. When an application tries to open the original file, the Storage
Filter redirects any file input or output to the SIS file in the SIS Common Store
directory. The filter driver is in all input/output (I/O) paths of volumes to which
it is attached, and handles the normal file operations that happen on SIS links,
such as read, write, open, close, and delete.
Creating SIS Links
The SIS Storage Filter creates SIS links. SIS links do not contain any file data,
but just contain a pointer to the Common Store file. Common Store files are located
in a protected directory. Because the data for SIS files is located in the Common
Store rather than in any particular link file, SIS avoids the problems that would
arise when a linked file is deleted or overwritten.
If the source file is not already an SIS link, its contents are copied to a newly
created file in the Common Store, and the source file is converted into a link to
that Common Store file. The destination file is then also updated as a link to the
Common Store file (either pre-existing or newly created). SIS keeps some out of
band information (called backpointers) associated with the Common Store file that
contains the set of links that point to the Common Store file. A
COPYFILE request adds such a backpointer
for the destination, and also for the source if it was not already an SIS link.
(Applications may use COPYFILE to create an SIS link. If the file is already a merged
duplicate, only a link is created. If the file exists elsewhere as a single copy,
SIS merges it as well.)
Using Copy-on-Close
The SIS Storage Filter handles reads by redirecting them to Common Store files,
and it handles writes by using a copy-on-close technique. The copy-on-close technique
for SIS differs from copy-on-write in that the copy is delayed beyond even the time
of the first write until the complete set of updates are made to the file, and then
only the portions of the file that have not been overwritten are copied from the
Common Store. Copy-on-close has two advantages over copy-on-write—there is no delay
for the copy at the time of the first write, and there is no need to copy the portions
of the file that are overwritten.
The cost of making SIS copies of files that are already SIS links is small and independent
of the size of the file. The disk-space overhead of a SIS link is about 300 bytes
regardless of the size of the file to which the link refers.
Using SIS Links and Reparse Points
An SIS link is implemented as a sparse file of the size of the file it represents
with no regions of data actually allocated on disk. Because there are no regions
allocated, the file uses only as much space as is needed for its link. An SIS link
has a reparse point with an SIS tag. The contents of the data portion of an SIS
reparse point include:
- Name of a Common Store file that backs the contents of the link
- Unique identifier for the link
- Signature of the contents of the Common Store file backing the link
- Internal bookkeeping information
Breaking SIS Links
When a file that SIS has consolidated is modified, or its contents are replaced,
the reparse point is removed and the user stores the modified copy. Other users,
who have not modified their versions of the document, continue to be served by reparse
points and the original copy in the SIS Common Store.
A change of time stamp or ACLs will not break a file's SIS links; SIS concerns itself
with only the data portion of a file. SIS does not touch any metadata, so any metadata
changes (other than file size) will not break a file's SIS links. Each SIS linked
file instance therefore still contains its correct time stamps, ACLs, and other
attributes. One exception would be the case in which a user tried to attach an extended
attribute to an SIS controlled file. In that case, the file would become the user's
custom copy and would no longer be accessed from the SIS Common Store. This is because
NTFS does not support extended attributes and reparse points on the same file at
the same time.
Note: The SIS Storage Filter cannot be stopped.
If this service is disabled, users will not be able to access the linked files.
If the Common Store is deleted, a loss of data will result for all linked files.
Common Store Architecture
All back-end files that SIS maintains are called Common Store files. One Common
Store exists on each SIS-maintained volume and contains all of the Common Store
files that exist on that volume. This directory is located in the root directory
of the volume and is called \SIS Common Store. The Common Store is implemented as
a directory that is by default restricted to allow access only to the system account.
Only the SIS filter and backup applications need access to this directory.
The properties of a Common Store file are the following:
- A Common Store file may have one or more links pointing to it.
- After a Common Store file is created, its contents never change.
- The names of Common Store files are globally unique—that is, they
are unique across all volumes across all systems in the world, and the binding between
a Common Store file name and its data is globally static.
When an SIS link is eliminated, either by deletion of the link or because of an
overwrite, the Storage Filter removes the corresponding backpointer in the Common
Store file. When all of the backpointers for a Common Store file are removed, the
filter deletes the Common Store file. All of this activity occurs on a system level,
and users cannot directly access the Common Store.
Differences from Symbolic Link
SIS is similar to the symbolic link feature implemented in UNIX and other operating
systems. However, SIS differs from symbolic links in the following three fundamental
ways that greatly enhance the overall value and usability of the solution:
- If two or more users each have their own duplicate copies of the same
file on a volume managed by SIS and someone modifies one of the files, the users
of the other files do not see the changes, because SIS works as an automatic copy-on-close
link. The two files are linked only as long as they are identical. In contrast,
with symbolic links, changes made through one of the links change the content of
all links to the file.
- The Common Store, the underlying shared disk repository that supports
SIS links, is maintained by the system, and files within it are deleted only if
all the SIS links that point to it are deleted. In contrast, symbolic links can
break if a user deletes the target file.
- SIS works automatically without any user involvement, in contrast
to symbolic links, which the user must set up and maintain. SIS automatically determines
that two or more files have the same content and places a copy in the Common Store.
Sisadmin.exe Command Line
Sisadmin.exe gives administrators command-line functionality for deploying, configuring,
and managing SIS, including activating the SIS Filter and Groveler services. Sisadmin
is used for switching Groveler between background and foreground use. It also displays
information about SIS files and manages error messages.
For more information about how the SIS architecture is implemented in Windows Storage
Server, refer to the
Windows Server 2003 Platform SDK at
http://www.microsoft.com/msdownload/platformsdk/sdkupdate/psdk-full.htm. For
more information about SIS architecture, refer to the Microsoft white paper Single Instance Storage in Windows 2000 at
http://research.microsoft.com/sn/Farsite/WSS2000.pdf.
SIS Backup API
The SIS Backup API provides the interface for OEMs to use in creating SIS-aware
backup and restoration solutions that can take advantage of the space savings that
SIS enables.
Integration with Other Features
SIS integrates with other features, including backup and restoration applications
and Windows Clustering.
Backup and Restoration Support
Organizations should benefit because OEMs can create SIS-aware backup applications
by using the SIS Backup API (Sisbkup.dll). Because SIS replaces identical files
with reparse points, backup programs must understand this capability of the NTFS
file system in order to efficiently back up and restore these files. SIS includes
the Sisbkup.dll module to support backup and restoration applications.
The advantages of using SIS and the SIS backup architecture include:
- The SIS architecture automatically maintains the connections between
the SIS links and the backing files as the backup application calls the SIS Backup
API functions.
- Because SIS is implemented as a filter driver for a file system, it
constantly tracks the connections between the SIS links and the back-end files.
When the files are backed up and restored, the SIS Backup API ensures that only
one instance of the back-end file will be backed up and restored, regardless of
the number of SIS links that point to it.
If a backup program is not aware of SIS files, it makes duplicate copies of the
SIS files in the actual backup data file. When restored, these copies create normal
files instead of the reparse points. The SIS Groveler eventually combines these
files, but possibly not in time to prevent the restoration operation from running
out of disk space.
Backing Up SIS Links
The backup application calls Sisbkup.dll when it encounters an SIS link, and the
DLL determines whether the backup needs to back up a Common Store file in response.
For restoring an SIS link, the restoration application calls the DLL, which in turn
determines whether the appropriate Common Store file already exists or whether it
has already reported that file to restore. If not, the restoration application reports
the Common Store file that corresponds to the link being restored. Because Common
Store files have universally unique file names, and their content never changes
after they are created, if the Common Store file still exists on the volume, there
is no need to restore over it; simply linking to it suffices.
Restoring SIS Files
SIS links are implemented as sparse files and reparse points. The structure and
contents of a reparse point are opaque to backup and restoration applications. When
restoring an SIS link, a restoration application should perform the following steps:
- Determine
the Common Store file or files to which the SIS link points (by calling Sisbkup.dll).
- If
the file or files do not exist in the Common Store, restore the file or files along
with the SIS link.
- If
the SIS link points to a Common Store file or files that exist on the disk, restore
only the SIS link because the data in Common Store files never changes. So if a
given Common Store file is still on the disk at restore time, it has the same contents
as when it was backed up, and there is no need to overwrite it.
The only additional overhead required for SIS-assisted backups is that the backup
application must back up the SIS link and the data associated with the backing files.
Note: All SIS backup and restoration operations
are local to a specific volume.
Backup and restoration of encrypted files is enabled by the Raw Encryption API,
which reads and writes encrypted files while keeping the data in encrypted format.
The API enables the encrypted data in these files to be backed up and restored.
The API also meets the goals of maintaining the security of the backed-up data.
The Windows Server 2003 Platform SDK provides more information about processing
requirements for backup and restoration applications. To download this SDK, go to
http://www.microsoft.com/msdownload/platformsdk/sdkupdate/psdk-full.htm.
Microsoft Clustering Support for SIS
SIS can be used with clustering, as long as all cluster node volumes are running
the Single Instance Storage service. If the receiving node is running the SIS Storage
Filter driver, the files can be accessed. If the receiving node does not have the
SIS Storage Filter, the files cannot be accessed.
To most efficiently back up or restore SIS files, the receiving node requires the
SIS Backup API (sisbkup.dll), which is installed by default. The backup application
needs to use the API to realize the media storage saving that SIS provides, otherwise,
the receiving server would back up the entire file system as duplicates.
Note: When SIS is enabled on Windows Storage Server
2003 R2 Service Pack 2 (SP2)-based server cluster or a Windows Unified Data Storage
Server 2003-based server cluster with one or more disk resources, the cluster resource
group may not be able to fail over between the cluster nodes. The server administrator
must restart the cluster node where the failure occurs to allow for the failover
to continue. Visit http://support.microsoft.com/kb/947266
for more information.
Best Practices
This section includes best practices and related observations compiled from network
administrators who have deployed SIS at Microsoft.
The best practices for deployment activities include the following:
- SIS can be deployed only on local NTFS volumes.
- SIS will actively monitor and consolidate up to six volumes. Servers
with a large number of volumes should configure SIS accordingly and apply SIS to
volumes that have the best potential for compression (according to the number of
duplicate files).
- SIS cannot be used on the system or boot volume or on remote drives.
- When possible, administrators should co-locate similar content on
the same volume, because SIS will not merge files across separate volumes. This
practice includes file shares within the same group or department, My Documents
folders, application content and media, and application installation shares.
- SIS merges files greater than or equal to 32 KB. An organization can
benefit considerably from using SIS to monitor volumes that house files much larger
than 32 KB. Generally, the larger the average file size, the greater the benefit
realized. Deploying SIS on a volume where files average 32 KB or less will provide
the least benefit.
- In general, SIS should not be used on volumes that are experiencing
high levels of I/O, because SIS merge and unmerge operations incur additional write
I/O.
- Because SIS runs as a background service and uses an efficient algorithm
for identifying duplicate files, SIS impact to CPU performance is negligible in
most scenarios.
- If a volume is compressed through SIS, manual intervention will be
needed to remove SIS.
- Administrators should be aware of possible interactions with other
storage filter drivers, such as quota filters, and test platforms accordingly prior
to deployment.
- Using SIS-aware backup applications will decrease space requirements
for tape and other media, in addition to decreasing network utilization during backups.
- SIS can be used to consolidate duplicate files that have distinct
file level permissions. Each duplicate file will transparently retain its own distinct
security settings after SIS merges it.
- SIS currently is available as a stand-alone service only within Windows
Storage Server 2003 R2. Administrators who want to benefit from SIS technology
should contact their Microsoft solution providers for details.
The best practices for postdeployment activities include the following:
- An administrator should never disable the SIS Storage Filter driver
unless he or she is removing SIS from a volume. The SIS Storage Filter driver is
required for accessing files in the SIS Common Store.
- If the SIS Storage Filter is inadvertently disabled, an administrator
can enable the Storage Filter by using a 0x0 startup type to re-establish access
to the SIS volume. For more information about using a 0x0 startup type with SIS,
refer to the article "Overview of memory dump file options for Windows Server 2003,
Windows XP, and Windows 2000" at
http://support.microsoft.com/?kbid=254649.
- An administrator should never delete the SIS Common Store folder unless
he or she is removing SIS from all volumes on a system. The Common Store holds the
only copies of SIS-identified duplicate files, and removing the Common Store would
cause such files to be lost.
- In the event of a failover to another cluster node, the SIS Groveler
service will need to be restarted because Groveler does not automatically detect
new volumes, including volumes that fail over. The network administrator will need
to manually recycle Groveler or use a scheduling tool to recycle Groveler periodically.
Conclusion
The SIS feature of Microsoft Windows Storage Server 2003 R2 provides an important
resource for organizations that are searching for ways to reduce demands for storage
resources. Internal deployments on more than 200 servers at Microsoft found an average
storage reduction of 25 to 40 percent, enabling the company to reduce storage by
14.5 terabytes. By replacing duplicate files with links that point to a single copy
in the SIS Common Store, organizations also benefit by reducing main memory cache
loads, thereby reducing overhead and server resource requirements.
An organization can use the SIS Backup API to create SIS-aware backup and restoration
applications, which can take advantage of the storage savings of SIS—meaning that
the organization can use smaller backup windows and less backup media. An organization
can also deploy SIS as part of a Windows Clustering cluster, enhancing data availability.
SIS is easy for organizations to take advantage of because it is completely transparent
to users, who continue to interact with files as if they were not stored elsewhere.
The efficiency of SIS is underscored by the fact that about 79 percent of files
are used in a read-only fashion. At the same time, SIS gracefully handles user interaction
with files, handling changes by removing an altered copy from the Common Store and
simply storing it as the nonduplicate file it has become.
For More Information
For more information about Microsoft products or services, call the Microsoft Sales
Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information
Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact
your local Microsoft subsidiary. To access information through the World Wide Web,
go to:
http://www.microsoft.com
To see other Microsoft IT Showcase white papers, go to:
http://www.microsoft.com/technet/itshowcase
To visit the Windows Storage Server 2003 home page, go to:
www.microsoft.com/windowsserversystem/wss2003/default.mspx
To download the Windows Server 2003 Platform SDK, go to:
http://www.microsoft.com/msdownload/platformsdk/sdkupdate/psdk-full.htm.
To find more information on SIS architecture, go to:
http://research.microsoft.com/sn/Farsite/WSS2000.pdf.