MS Windows NT Server 4.0 Enterprise File Server Scalability and Performance

Article
02/20/2014

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

By David B. Cross, Microsoft Consulting Services

Introduction

Microsoft® Consulting Services (MCS) engaged with an enterprise customer to assess and analyze the scalability and performance on Microsoft® Windows NT® Server 4.0 in an enterprise file server capacity. The customer currently has 2,000 file servers deployed on their network and is interested in lowering the cost of maintaining these servers. The customer's specific goals include the following:

Consolidating multiple file servers into a single Windows NT Server Enterprise Edition cluster
Significantly reducing the number of system administrators required to manage their network file servers
Providing fast access to large amounts of data (over 2 terabytes) stored on networked file servers
Providing access to a large number of network users (10,000) while maintaining fast response times
Minimizing the amount of scheduled and unscheduled downtime of the networked file servers

This document covers the following topics:

Procedures used to test the scalability and performance of Windows NT Server file services
Necessary hardware configuration required to achieve the desired customer results
Key tuning parameters used to tune the environment for optimal file server performance

Executive Summary

Windows NT Server 4.0 met or exceeded the customer's expectations in an enterprise file server environment. The customer plans to replace approximately 20 to 60 existing file servers for every Windows NT Server 4.0-based cluster. This will result in a significantly lower total cost of ownership due to the significant hardware reduction in file servers and administrative overhead required to maintain large numbers of distributed file servers. The test results in this document show that Windows NT Server 4.0 is capable of meeting enterprise requirements for very large file servers.

The test team validated the scalability and performance of Windows NT Server 4.0, Enterprise Edition to support an enterprise-level file server deployment. The following specific technical objectives were achieved:

Validated that Windows NT Server 4.0 is capable of supporting 10,000 users per Windows NT Server cluster
Scaled a redundant external drive array system under Windows NT Server to manage 2 terabytes of disk space
Determined the appropriate tuning parameters to achieve optimal performance in an enterprise file server deployment
Designed a fault tolerant system minimizing single points of failure through hardware and software redundancy
Validated that Windows NT Server can scale to support large numbers of file shares (8,000)
Validated gigabit Ethernet performance on Windows NT Server 4.0

Summary Results

The project team was able to provide a system configuration that exceeded the customer's expectations.

Data throughput requirements. The clustered system provided an average data throughput of 39 megabytes (MB) per second (312 Mbps) when the system was placed under a sustained client load. The total system CPU utilization was consistently below 80 percent. The customer requirement was 30 MB/second from the clustered system, which was exceeded.
Network throughput requirements. The network throughput requirement of 15 MB/sec per node was exceeded using four Fast Ethernet adapters or a single gigabit Ethernet adapter. However, the single gigabit configuration utilized less CPU resources than using multiple Fast Ethernet adapters, due to the fact that the single gigabit adapter generated fewer network interrupts.
File share expectations. Testing proved that a large number of shares (8,000) do not noticeably decrease the performance or data throughput of the system.
Response time requirements. The customer's defined client-response time requirements were met or exceeded. Specifically, 64-kilobyte (KB) file transfers were consistently below one second and 1-MB file transfer averages were consistently below five seconds.

Project Overview

The project was developed to test the scalability and performance of Windows NT Server 4.0, Enterprise Edition using Service Pack 4 (SP4) on the latest Intel platform hardware. The project was designed to demonstrate the ability of Windows NT Server to meet the requirements of an enterprise file server based on customer demands. A composite team consisting of the customer, Microsoft Consulting Services, and Compaq was formed to design and test the system, based on the customer-defined performance requirements.

Project Goals

The customer defined several specific performance goals, based on an analysis of their production environment, to serve as benchmarks for enterprise scalability. All benchmark goals were met or exceeded during the performance tests:

Sustain an average data throughput of 15 megabytes per second (MB/second) from each node in a Windows NT Server cluster running in an active-active configuration. An active-active configuration consists of both cluster nodes simultaneously operating and providing data to end-client machines.
Sustain an average data throughput of 30 MB/second from a single node of a Windows NT Server cluster when running in fail over mode.
Maintain approximately 80 percent of data throughput as data being transmitted from the server when users are requesting or reading files from the server.
Maintain an average response time of one second or less for small files. Small files are defined as 64 KB files in a Bluecurve Dynameasure standard file set.
Maintain an average response time of five seconds or less for large files. Large files are defined as 1 MB files in a Bluecurve Dynameasure special file set.
Ensure that a large number of file shares would not adversely affect performance.

Testing Methodology

An isolated test laboratory was prepared to conduct the tests. A flat network model using Compaq gigabit switches was designed to eliminate the potential for switch latencies associated with multiple tier networks. A flat network model is defined as a network that does not cascade Ethernet switches between client machines and the server. The Bluecurve Dynameasure (https://www.bluecurve.com) software was chosen as the load testing software because it best simulated the customer environment and client workload.

Testing scope

The scope of the testing was limited to the following:

Evaluation of the Windows NT Server components that were considered critical to the scalability of Windows NT Server in an enterprise-level file server deployment.
Recommendations for configuration of Windows NT Server components and the use of Windows NT Server in an enterprise-level deployment based on the scalability testing results.

The determination of the critical Windows NT Server components was made on the basis of the expected use of Windows NT Server in an enterprise-level deployment.

Excluded from the scope were the following:

Assessment of wide area network cross router issues.
Assessment of the impacts of Microsoft Windows NT Server infrastructure (for example, Master Account Domains, Windows Naming Service) on Windows NT Server file server performance or operation.
Assessment of hardware redundancy or operational requirements.
Definition of the production support required for an enterprise Windows NT file server deployment.

Server Configuration

This section details the various hardware, software, and tuning parameter configurations utilized in the testing performed. The following subsections describe the settings used for each of these categories. The configuration detailed here is the final configuration used to achieve the results detailed in the performance results section of this document. This configuration is based on multiple performance tests performed to determine the optimum configuration, based on the customer environment.

System Hardware

Compaq provided the following hardware configuration for testing and analysis of Windows NT Server 4.0, Enterprise Edition file server performance. Table 1 below details the final server hardware configuration:

Table 1 System Configuration

Operating System	Windows NT Server 4.0, Enterprise Edition with Service Pack 4
Processor	4 Processors 400 MHz Xeon
System Bus	100 MHz
PCI Buses	One 64-bit 66 MHz (5 slots) and two 32-bit 33 MHz
L2 Cache	1 MB four-way set-associative
Memory	2 GB, 50 ns
Disk Adapter	2 LP7000 (Emulex) fiber channel
Disk Controller	2 StorageWorks fiber channel, 256 MB effective cache memory/controller
Disks	Ninety-two 18 GB Fibre Channel disks in 8 (10) disk partitions
Interrupts	Default dynamic interrupt distribution
Striping	HW parity and striping (RAID 5)
Disk Format	NTFS
Network	Flat network using two switched gigabit hubs, one gigabit adapter in server
Clients	24 Pentium II 2P 400 MHz with 256 MB RAM and Fast Ethernet

Software

A number of tuning parameters were examined and tested to determine their impact on the simulated customer environment. Based on the testing results, the following tuning parameters are provided that resulted in positive impact on performance of the system

Note: The various changes applied to the system tested were specific to the customer environment and the hardware employed in testing. These tuning parameters may not improve performance in every environment.

Base Configuration

The following configuration changes were applied to a base installation of Windows NT Server 4.0, Enterprise Edition with Service Pack 4. The following changes were applied to maximize the performance of a very large server dedicated to providing file services:

Configured the server as a member server. Microsoft does not recommend that large dedicated file servers be configured as backup domain controllers, due to the overhead associated with the Netlogon service.
Optimized server service for file and print server. File caching is adversely affected if the service is not configured for file and print.
Formatted logical volumes with 64-KB allocations. Setting the allocation size to 64 KB improves the efficiency of the file system by reducing fragmentation of the file system and reducing the number of allocation units required for large allocations. This is accomplished through the following command line entry:

format x: /A:64K /fs:ntfs
Increased NTFS log file to 64 MB. Setting the NTFS log file to 64 MB reduces the frequency of NTFS log file expansion. Log file expansion is costly because it locks the volume for the duration of the log file expansion operation. This is accomplished through the following command line entry:

Chkdsk x: /L:65536
Configured page file across stripe (RAID 0) set. To achieve maximum page file performance, the system page file was configured to utilize a RAID 0 stripe set on the StorageWorks unit. Note: The maximum swap file size on an individual disk is 4095 MB, and performance degradation may occur if more than four disks are used in a stripe set.

Registry Changes

The following registry changes were made to optimize the performance of the system, as well as to ensure that data collection does not interfere with performance:

HKLM\SYSTEM\CurrentControlSet\Services\N1005\Parameters\NumRXDescriptors=0x300 (REG_DWORD)

HKLM\SYSTEM\CurrentControlSet\Services\N1005\Parameters\NumColesceBuffers=0x300 (REG_DWORD)

HKLM\SYSTEM\CurrentControlSet\Services\N1005\Parameters\NumTXDescriptors=0x400 (REG_DWORD)

The previous three entries were changed to maximize the size of the buffers on the Compaq gigabit adapter to alleviate any TCP retransmissions and maximize the network adapter performance. The three entries may also be changed through the Compaq Setplus utility.

HKLM\SYSTEM\CurrentControlSet\Services\NDIS\Parameters\ProcessorAffinityMask=0 (REG_DWORD)

This registry entry allows the operating system to balance the network adapter- generated interrupts across all processors. If the value of this entry is 0, DPCs are serviced by the same processor that serviced the interrupt. This setting is useful for platforms that distribute interrupts among all processors, such as the Windows NT 4.0 platforms based on Intel Pentium and Pentium Pro (P6) processors.

HKLM\SYSTEM\CurrentControlSet\Services\TCPIP\Parameters\TcpTimedWaitDelay=x1e

The TcpTimedWaitDelay registry entry sets the length of time a timed-wait Transport Control Block (TCB) is kept before being returned to the free list to 30 [0x1e] seconds. Timed-wait TCBs are kept after a disconnection as part of graceful close of a network connection.

HKLM\SYSTEM\CurrentControlSet\Services\TCPIP\Parameters\MaxHashTableSize=8192 (decimal entry)

In a large system, the number of TCP sessions may necessitate the need to increase the default size of the TCB hash table size and reduce the amount of CPU time spent in finding TCB. This registry entry changes the maximum size of the hash table from 512 entries to 8192 entries.

HKLM\SYSTEM\CurrentControlSet\Services\TCPIP\Parameters\TcpWindowsSize=xffff

This registry entry increases the maximum TCP receive window size to the maximum of 64 KB that is necessary for large memory servers that receive a high volume of network traffic.

HKLM\SYSTEM\CurrentControlSet\Services\(all NetFlex3 ports)\Parameters\MaxReceives=200 (decimal entry)

Note: Many Ethernet adapters may allow this change through the user interface, and not the registry.

HKLM\SYSTEM\CurrentControlSet\Control\SessionManager\MemoryManagement\UnusedFileCache=0x14 (hex entry) (REG_DWORD)

This registry entry allows the system to improve memory utilization for the file system cache and allows for more files to be open simultaneously on a large system. It can however, utilize additional paged pool memory. For more information, see Knowledge Base article 192409.

HKLM\SYSTEM\CurrentControlSet\Control\SessionManager\I/OSystem\LargeIrpStackLocations=x7 (REG_DWORD)

LargeIrpStackLocations was set to 7 because the disk performance counter was enabled to collect data on disk IO operations. If the disk performance counter is on and LargeIrpStackLocations is not increased to 7, the IRPs for disk IO are larger than the pre-sized look-aside list allocation size, which causes the IRPs to be allocated from non-paged pool rather than the look-aside list. If the disk performance counter is turned off, LargeIrpStackLocations can be set to 4 or removed. An additional 2 to 3 percent system throughput and performance may be gained on some systems if the disk performance counters are not collected.

Note: The number of LargeIrpStackLocations required increases in the Microsoft Windows 2000 operating system, and the value may have to be increased accordingly.

Compaq StorageWorks Configuration

The maximum allowable cache memory of 512 MB (256 MB mirrored) was installed in StorageWorks controllers.
Two gigabytes (GB) of RAM was installed in Compaq Xeon server with homogeneous DIMMs across all memory slots. Based on the observations and testing of the project team, 2 GB of RAM gave the system near maximum performance as a file server. If other applications besides file services, 4 GB of RAM may provide additional performance. This configuration under Windows 2000 Advanced Server may achieve additional performance with 4 GB of RAM because the working set of the file system cache and non-paged pool limits are expanded.
Configured RAID 5 logical disks with ten physical disks in each logical disk. A total of eight RAID 5 logical disks were used for performance testing. Although the overall system contained a total of one hundred forty-four 18.1-GB hot-swappable Compaq drives, eight logical disks were used for ease in testing. The logical disk size was based on a compromise of physical drive performance against LUN numbering limitations of both StorageWorks and Windows NT Server. Unused disks in the system were utilized as online spares for the various RAID 5 sets.
Set MAXIMUM_CACHED_TRANSFER_SIZE=256 on StorageWorks.
Set CHUNKSIZE=128 KB blocks on StorageWorks. This was the maximum size permitted on StorageWorks.
Installed Compaq SecurePath software for StorageWorks. Set the preference path for two drives to prefer one controller to the other. This ensured equity in the transfer rate between the four drives.

Client Configuration

The project team chose to utilize Bluecurve Dynameasure software, based on an analysis of the customer environment and the client workload generated in that environment. All tests utilized a Dynameasure 200-MB standard file set, consisting of 64-KB files, and a Dynameasure special file set consisting of 1-MB files which was in turn comprised of Microsoft Word, Excel, and PowerPoint files. A delay time of 1 second was used for both the standard and special file sets. A total of 24 Bluecurve client computers were incorporated into the test with 400 to 600 motors distributed equally across all computers. Each computer was configured with dual Pentium II 400 MHz processors, 256 MB of RAM, and a 100-MB Ethernet connection direct to a switch.

Environment Testing

This section briefly describes and details the testing environment, as well as the tools employed in performance measurement. Bluecurve Dynameasure software (Service Pack 2) was chosen as the primary tool for performance measurement and workload generation, based on the ability of the software to most closely simulate the customer environment. The testing occurred over a period of six weeks during the months of November and December, 1998.

Bluecurve Dynameasure

Bluecurve Dynameasure is a Windows NT capacity planning and reliability management tool that combines stress-testing techniques with utilization monitoring1. Dynameasure implements a close approximation of users performing real work on networked clients and servers. Its components include a test dataset and multiple test scenarios. The test dataset contains the Dynameasure test scheme and data. Dynameasure provides a scalable test dataset. The test dataset is designed around a scheme typical for a target service, including file services that were utilized in the test performed for this project.

The 2-terabyte Dynameasure Standard File dataset used for this project is a collection of text, data, and image files. The files reside in a shared directory on the target server. (There is no Dynameasure software component on the target server, only data.) The test scenario selected for this project was a combination of the Standard File test and a custom test. These tests, which run concurrently, simulated a test scenario with multiple user profiles. The Standard File test uses a customer- defined mix of read and write transactions. A transaction in this test is a file copy, either from client to server or server to client. Each file is 64 KB in size. Files are read and written in 16-KB blocks. The files are of various types, including text, image, data, and compressed versions of each type.

All test resources and activities--client and server selection, data-set generation, and test and result management are performed from a client running the Dynameasure Manager. Tests queued up and executed from the Manager call upon networked client machines running Dynameasure motors (simulated users) to copy files to or from the target server. During a test, stress is increased in a graduated fashion by adding more motors successively in timed increments. When a test is finished, results were viewed graphically within Dynameasure or exported to Excel. Performance measurements include transactions-per-second, average response time, and utilization (server, client, and network).

For more information about Bluecurve and Dynameasure, see https://www.bluecurve.com.

Load Generation/File Mix

File locality and the randomness of the workload play a key role in the testing and performance of a system. A completely random workload that does not utilize any caching functionality of the operating system or the hardware invariably skews the performance results to be lower than expected. Based on an analysis of file server caching in the customer environment, a workload was developed to best simulate that environment with the Bluecurve Dynameasure software.

The following 2-terabyte workload data set was used throughout the analysis and testing:

A 100-GB static dataset was created on each logical partition to simulate used space on the disk and in the NTFS log.
A 200-MB Bluecurve standard file set was created on each logical partition to simulate small file transfers (64 KB) from users.
A 1-MB special file set consisting of Microsoft Word, Excel, and PowerPoint® files was also used for each test on each logical partition to simulate larger file activity from end users.

Performance Monitoring

The Windows NT Performance Monitor (Perfmon.exe) played a key role in the analysis and documentation of the performance results. For additional information, refer to the Performance Monitor help file (Perfmon.hlp) or the Windows NT 4.0 Resource Kit. Table 2 below is provided to help administrators understand and evaluate the performance of similar systems undergoing similar performance tests.

Table 2 Performance Monitoring

Object	Counter	Description
CPU Counters		Three performance monitor counters determinine whether or not the CPU is a bottleneck on a particular server. They are the following:
Memory	Cache Bytes	Cache Bytes is the amount of the cache memory currently being used by the system. The maximum amount of RAM that the system may use for caching is 512 MB. Note: Windows 2000 will increase the NTFS cache size limit to 3838 views and a total of 950 MB.
Memory	Cache Bytes Peak	Cache Bytes Peak is the maximum number of bytes used by the cache at any given time.
Memory	Pages/sec	Pages/sec is the number of pages read from or written to disk to resolve hard page faults. (Hard page faults occur when a process requires code or data that is not in its working set or elsewhere in physical memory and must be retrieved from disk.) A high level of paging activity can be acceptable (pages/sec >500), but if it is associated with low available bytes, a problem may exist.
Memory	Pool Non-paged Bytes	Pool Non-paged Bytes is the number of bytes in the non-paged pool, an area of system memory (physical memory used by the operating system) for objects that cannot be written to disk, but must remain in physical memory as long as they are allocated. The system may be overloaded when this value is greater than 120 MB or the sum of the paged and non-paged pools total 256 MB. Note: This counter displays the last observed value only; it is not an average.
Memory	Pool Paged Bytes	Pool Paged Bytes is the number of bytes in the paged pool, an area of system memory (physical memory used by the operating system) for objects that can be written to disk when they are not being used. The system may be overloaded with a large SAM size or a large number of user sessions when this value is greater than 156 MB, or the sum of the paged and non-paged pools total 256 MB. Note: This counter displays the last observed value only; it is not an average.
Cache	Copy Read Hits %	Copy Read Hits % is the percentage of requests found in the cache. Based on an analysis of the customer environment, the goal was to emulate approximately an 80 percent cache hit rate.
Server	Pool Non-paged failures	This counter records the number of times that allocations from non-paged pool failed. If greater than 1, enough physical memory does not exist on the server.
Server	Pool paged failures	This counter records that the number of times allocations from paged pool have failed. If greater than 1, there is not enough physical memory, or the paging file is too small.
Processor	%Processor Time	% Processor Time is the percentage of time that the processor is executing a non-Idle thread.
System	%Total Processor Time	% Total Processor Time is an aggregate percentage of time that all the processors in the server are executing a non-Idle thread. The general rule is that the total system processor time should not exceed 80 for an extended period of time, and no individual processor should be sustained at 100.
System	Total Interrupts/second	Interrupts/sec is the average number of hardware interrupts that the processor is receiving and servicing in each second. This counter was monitored to measure the difference in interrupts generated by multiple fast Ethernet adapters versus a single gigabit Ethernet adapter. A CPU bottleneck may often be caused when an inordinate number of hardware or network interrupts are generated.
Server Work Queues	Queue Length	Queue Length is the current length of the server work queue for a particular CPU. This is an instantaneous count, not an average over time. It is common to see sharp spikes and dips when monitoring this counter; therefore, it is required to monitor the counter over time and capture an average. The general rule is that an individual processor is running consistently above 85 percent and the average Server Work Queues length is higher than 3 consistently, the CPU is limiting the performance of the server. For a multiple CPU server (SMP), the guideline is that the % Total Processor Time should be greater than 85 percent and the aggregate queue length should be greater than two times the number of CPUs in the server. To capture a possible CPU bottleneck in Performance Monitor, an administrator must run Performance Monitor over an extended period of time, as well as in a granular fashion. Extended monitoring may not highlight points of time when counters are unusually or unacceptably high that can be better captured with granular monitoring. Therefore, it may be necessary to run multiple instances of Performance Monitor against a single server.
Network Counters		Two specific network counters were monitored to ensure that the system was performing properly from a network perspective
Network Interface	Bytes Total/sec	Bytes Total/sec is the rate at which bytes are sent and received on a selected network interface, including framing characters. This counter was monitored to ensure that the network bytes/second closely matched the disk bytes/second when under a constant load.
Network Interface	Packets Outbound and Received Errors	Packets Outbound and Received Errors indicates the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. If the Packets Outbound and Received errors is greater than 1, the selected network interface may be experiencing errors. Note: The Network Interface and Network Segment objects are not installed by default on Windows NT Server. The SNMP Agent service and the Network Tools and Agent must be installed prior to collecting these counters.
Data Throughput
Server	Bytes Total/second	Bytes Total/second is the number of bytes the server has sent to and received from the network. This counter is monitored and recorded to measure the average data throughput the system is producing when under a test load.
Server	Bytes Received/second	Bytes Received/second is used in calculating the server Bytes Total/second. It is monitored along with Bytes Transmitted/second to ensure that the workload is adequately segmented for the server to transmit more data than it receives.
Server	Bytes Transmitted/second	Bytes Transmitted/second is used in calculating the server Bytes Total/second. It is monitored along with Bytes Received to ensure that the workload is adequately segmented for the server to transmit more data than it receives.
Physical Disk	%Disk Time	% Disk Time is the percentage of time that a disk is busy. The general rule is that the total %Disk Time for all logical disks should be less than 85.
Physical Disk	Average Disk Queue Length	Average Disk Queue Length is the number of requests for disk access. The general rule of thumb is that the total average disk queue length should be less than or equal to 3. It may be important to note the actual number of spindles in a hardware RAID set and multiply the number of spindles by the average disk queue length.

Performance Analysis

After the software is tuned correctly, be sure that you are utilizing the system resources (CPUs, RAM, Network, and so on) effectively to identify and eliminate any constrained resources or system bottlenecks. For example, low CPU utilization is a good indication that one of the following resources could be a bottleneck:

Disk subsystem
Memory
Network

On the other hand, if CPU utilization is at 100 percent, this indicates that you need to either add additional CPUs (if the server is capable of accepting additional CPUs) or that your system has reached its full potential.

This section details the results of the test phase of the project.

System Bottlenecks

Based on monitoring of the performance counters outlined in the previous section, the project team was able to identify and eliminate bottlenecks in the system. When the system was stressed, the CPU utilization when using gigabit Ethernet adapter was at approximately 80 percent, and the network throughput was 39 MB/sec. Since the CPU utilization was below 100 percent and network throughput was less than 100MB/sec, both the CPU and network were eliminated as bottlenecks.

Through further analysis, it was determined that the clients generating the load was less than that required to push the system to its full potential. Since the performance requirements of the customer were met with the current client load, the system was not pushed any further.

Network Analysis

With a flat network model, the project team observed equal system throughput for both the single gigabit and Fast Ethernet adapters in the Compaq Xeon server. However, the single Gigabit adapter generated fewer network interrupts (a reduction of 70 percent) and effectively lowered the overall system CPU time (by approximately 5 percent). See Figure 2. However, there were a number of network performance problems with a two-tier network. The following observed performance issues relating to this design were observed:

The switched characteristics of the hubs were lost because each switch on both tiers was switching which meant that there were collisions.
The multiple levels of switching increased the overall latency.
When a flat-model design was implemented, a 30 percent gain in throughput resulted compared to a two-tier network.

File Shares and Sessions Analysis

Based on the assumption that an enterprise file server would inherently have a large number of file shares created, the project team tested the performance and operations of the system with 8,000 file server shares installed. The project team found no inherent performance or operational changes of significance through the addition of a large number of shares on the system. Separately, an analysis was made of the memory usage of file shares and connection costs of TCP sessions. The project team documented the following:

A connection structure (TCP connection) requires ~ 500 bytes paged and 500 bytes non-paged space.
For every resource that a client has opened on a server (net use connection), an additional 100 bytes paged and 8 bytes non-paged memory are required.
Each share created on a server requires approximately 250 bytes for each share in the paged pool. This is broken out at ~ 200 bytes for the structure and ~ 50 bytes for the name and path in the registry.

Microsoft Cluster Server (MSCS) Performance

A separate analysis was made to determine the impact of Microsoft Cluster on the performance of a large file server. It was determined that the impact on performance and addition to system overhead was minimal or insignificant. In reality, the project team found that Microsoft Cluster Server added a level of reliability and availability to the system when scheduled maintenance or changes needed to be applied to the cluster. For example, under a sustained load, the system was able to execute a failover of all resources (2 terabytes of disk space and multiple network names) during a simulated node failure within 1 to 2 minutes.

Windows 2000 Advanced Server Performance

The project team used an interim build of Windows 2000 Advanced Server to run a basic test with Windows 2000 Advanced Server loaded on the system. With no tuning parameters applied, the Beta 2 build of Advanced Server improved throughput and performance by 10 percent, with average data throughput at approximately 43 MB/second.

Performance Charts

The following performance charts provided a graphical representation of the data collected by Performance Monitor during Bluecurve Dynameasure performance runs.

Data Throughput

One of the customer-defined primary goals was to achieve a sustained throughput of 30 MB/second. Based on the identified configuration, the system was able to yield an average high throughput of 39 MB/second with approximately 80 percent total system CPU utilization. The ability to achieve greater data throughput was only limited by the number of clients available to generate workload. The project team was limited to 24 workstations. In a separate test at Compaq's Enterprise Solution Center in Houston, Texas, a performance test indicated that the system under identical load and both nodes of a cluster operating in active-active configuration was able to achieve an aggregate of over 50 MB/second. The data is displayed in Figure 1 below, which represents a Bluecurve Dynameasure test with 600 client motors against a single-node system.

Figure 1: Data Throughput and CPU Utilization

CPU Utilization

As mentioned earlier, tests were conducted to compare the performance of four Fast Ethernet network adapters against a single Compaq Gigabit Ethernet adapter. The test results yielded nearly identical average data throughput; however, the single Gigabit adapter required less total system CPU time. This is due to the fact that the system processors services fewer network-generated interrupts. Theoretically, due to more available CPU time, a single Gigabit Ethernet adapter would provide higher maximum data throughput. The CPU comparison is shown in Figure 2 below, again with a Bluecurve Dynameasure test consisting of 600 client motors.

Figure 2: CPU Utilization Comparison (4) Fast Ethernet versus (1) Gigabit Adapter

Bluecurve Response Time

Customer requirements included a maximum response time for the Bluecurve clients as part of the performance requirements. Average response time (ART) is the measurement of the average time it takes for a transaction to execute between client and server. ART is a measure of system responsiveness and corresponds to the users' experience of system performance. The response time is measured by the Dynameasure software and recorded in a Microsoft Access database. The customer requirements included a minimum response time of one second for 64-KB file transfers and 5 seconds for 1-MB file transfers. Both requirements were met as shown in Figure 3 below.

Figure 3: Bluecurve Client Response Time

Bytes Transmitted and Received Comparison

It was important to ensure that the ratio of bytes transmitted to received was approximately 3:1 to show that the server was providing approximately 80 percent of the workload and not the clients. As seen in Figure 4 below, the server is transmitting approximately 75 percent of the workload.

Figure 4: Bytes Transmitted/Received Comparison

Conclusion

In conclusion, the project highlighted a number of specific details on how companies may take advantage of the scalability and availability of Windows NT Server 4.0 Enterprise Edition. Specifically, companies can do the following:

Reduce TCO by consolidating file servers. A specific scenario is the reduction of server administrators. Assuming that every 20 servers require an administrator at an annual cost of $100,000, replacing existing file servers with clusters significantly reduces TCO. By replacing 2000 existing file servers with 50 two-node clustered servers (100 servers total), a company could realize an annual savings in administrative costs of approximately $10 million.
Increase the availability of their network file servers through clustering. Microsoft Cluster Server provides for minimal downtime of file servers through the ability to fail over resources during scheduled and unscheduled outages.
Improve the performance of high-end file server implementations on Windows NT Server 4.0, Enterprise Edition by using the tuning parameters and configuration information provided in this paper.

The information contained in this document represents the current view of Microsoft Corp. on the issues discussed as of the date of publication. This is a preliminary document and may be changed substantially prior to final commercial release. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. The entire risk of the use or results of the use of this document remains with the user. Companies, names, and data used in the examples herein are fictitious unless otherwise noted. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Microsoft Corp.

1 Microsoft Consulting Services collaborated with Bluecurve to produce the Microsoft Infrastructure Capacity and Reliability Management Service Guide, MS part number 098-71619