Microsoft Commerce Server 2000: Maximizing Performance

Article
11/12/2009

No company wants its customers to have poor experiences when they visit its Web site. Customers can become frustrated by slow response times, timeouts, and errors or broken links, prompting them to go to other sites to find what they're looking for. To keep customers interested, you must build an infrastructure that can handle not only average levels of demand but peak levels as well.

The success of your site depends heavily on how well you plan for capacity and manage site performance. To ensure adequate capacity, you must calculate how much computing hardware you need to handle the load that thousands or hundreds of thousands of users can put on your site. These calculations can help you find weak areas that can cause performance degradation. You can resolve weak areas by adding hardware or by redesigning dynamic pages or other CPU-intensive tools.

Good capacity planning can also help you decide how widely to advertise your site to attract more customers, as well as help you plan future infrastructure improvements to adequately handle growth. Realizing your site's full potential depends largely on satisfying the demands of your customers, which means providing:

Quality of service
Quality of content
Speedy access

Capacity planning is the process of measuring a Web site's ability to serve content to its visitors at an acceptable speed. You determine capacity by measuring the number of visitors to your site, determining how much demand each visitor places on the server, and then calculating the computing resources (CPU, RAM, disk space, and network bandwidth) necessary to support current and future usage levels.

The following table lists three factors that determine site capacity.

Factor	Description
Number of visitors	As your site attracts more visitors, you must increase capacity or performance will degrade.
Server capacity and configuration of hardware and software	Upgrading your computing infrastructure can increase site capacity, thereby allowing more visitors, more complex content, or a combination of the two.
Site content	As the content becomes more complex, the servers have to do more work per visitor, thereby lowering site capacity. Sometimes you can increase capacity by simplifying content, minimizing database use and dynamic content, and using simpler HTML pages.

Capacity planning should be an ongoing concern for any Web site. Whenever any one of the three factors changes significantly, you must recalculate site capacity. Capacity can be expressed as the following equation:

Number of concurrent users = Hardware capacity / Load on hardware per user

(In this equation, hardware capacity refers to both server and network capacity.)

This capacity equation suggests two corollaries:

Decreasing the load that each user puts on the hardware by planning, programming, and configuring site content to make more efficient use of computing resources can enable you to increase the number of concurrent users.
Configuring the site infrastructure to increase hardware capacity can enable you to increase the number of concurrent users. You can increase hardware capacity by scaling the hardware horizontally (adding more servers) or vertically (upgrading existing servers).

When you study capacity, you should address the following areas:

Number of concurrent users supported by the current hardware
Scalability options if the number of concurrent users increases
Scalability options if site content becomes more complex
Potential bottlenecks in the system
Performance guidelines for programmers and other content developers
Site performance predictions

Logically, managing site performance correlates very closely with planning site capacity. You manage site performance to tune your site so that you can support more visitors. To properly manage performance, you must continuously evaluate your site to see whether or not it is delivering the level of performance you want, and then if it does not, tune it until it does. To tune your site, you need to evaluate its architecture and the code in the Active Server Pages (ASP), and then investigate available technologies to see what you can use to enhance performance. You must also make sure that the infrastructure of your site (hardware and software) can support the number of concurrent users to your site with an acceptable response time. Maximizing performance is especially important to e-commerce sites because the number of visitors, as well as the content of the site, can change over a short period of time.

Transaction Cost Analysis

One method of measuring site capacity is called transaction cost analysis (TCA). TCA is a method of measuring the performance cost of a transaction. TCA helps you compare server transactions with one another to determine which ones are putting the greatest demands on your system.

The term transaction (also called operation) refers to work done by a server or servers (Web servers, middle-tier servers, and Microsoft SQL Server servers) to fulfill a user request. For example, a request for a product description page stored in a database is a transaction, as is a request to add an item to a shopping basket. This term does not refer to e-commerce transactions in which money is exchanged.

Dynamic sites that involve database transactions tend to be more complex and place heavier demands on Web servers than static sites that usually serve only static HTML pages. Customers of dynamic sites typically use the sites not only to look up information already stored at the site, but also to add new information of their own.

TCA can help you answer the following questions:

What hardware do we need?
How many concurrent users can our site serve?
When do we need to add servers?
Can our site handle peak traffic (such as Back-to-School, holidays, and so on)?
Where are the bottlenecks in our site?

Documenting Your Site

The first step in performing TCA is to document your site hardware, software, and content. These diagrams can help highlight data center issues, and are very helpful for visualizing traffic flow through your system.

Figure 19.1 shows the type of hardware diagram you should create.

Cc936697.f19csrk01(en-US,CS.10).gif

Figure 19.1 Sample hardware diagram

Figure 19.2 shows the type of software diagram you should create to help you understand any site software issues and to see how the software is interrelated.

Cc936697.f19csrk02(en-US,CS.10).gif

Figure 19.2 Sample software diagram

To document site content, record the following information for each server:

Server type (Web, database, and so on)
Description of the content on the server
Directory structure
File permissions

Analyzing Traffic

Next, you must determine how many users typically visit the site concurrently. This data usually comes from:

Market analysis. Analysis of a new site. You have probably commissioned a market analysis report to predict how much traffic your site can expect to receive at the time it is deployed and afterward. Use this report as the basis for your TCA.
Site usage analysis. Analysis of an existing site. Analyze your Web server log files to see how many hits your site receives at any given time, as well as usage trends that might indicate whether parts of the site have become more or less popular over time. When calculating how many concurrent users your site currently supports, remember to base your calculations on peak usage, rather than on average usage. Commerce Server provides Web usage and diagnostic reports for analyzing Web site usage.

Creating a Site Usage Profile

After you know how many customers are visiting your site, you must then determine how they use it so that you can estimate how much demand a typical customer places on the system. A usage profile describes the way in which customers use a site by determining site traffic patterns, such as how many customers browse a certain page and how many add an item to their basket and then remove it.

To create a usage profile, you need to analyze your site's usage log files. If you have them, use logs gathered over a long period of time (at least a week) to get accurate averages. First, identify operations that customers can do (browse, search, and so on). Next, gather the following data:

Number of customers visiting the site
Number of hits each page receives (which pages have been visited)
Time spent on each page
Session length
Peak periods of activity

You can use the number of visits to each page to profile typical shopper operations for the site. The following table shows a usage profile with typical shopper operations for an e-commerce retailer.

Shopper operation	Operations per session	Operations per second (transaction frequency)	Percentage of total
Add Item	0.24	0.00033	2.00
Add Item + Checkout	0.02	0.00003	0.17
Add Item + Delete	0.04	0.00006	0.33
Basket	0.75	0.00104	6.25
Default (home page)	1.00	0.00139	8.33
Listing	2.50	0.00347	20.83
Lookup	0.75	0.00104	6.25
New	0.25	0.00035	2.08
Product	4.20	0.00583	35.00
Search	1.25	0.00174	10.42
Welcome	1.00	0.00139	8.33

This table shows the shopper operations that account for 90 percent of the hits received by the site in this example. As a rule, you should generate a usage profile that lists the pages or operations responsible for 90 percent of the total hit count of your site. Note that files such as images are not included in this table, for simplicity; if image and static HTML file requests contribute to the top 90 percent of the hits received by your site, be sure to include them in your usage profile.

Defining Acceptable Operating Parameters

The primary benchmark for determining whether a Web site is operating at an acceptable level is latency, or how long a user must wait for a page to load after a request has been made. Note that although some servers can handle every request they receive, the load might create unacceptable response times, requiring a better solution if the site is to operate efficiently and at an acceptable level of service. In general, static content like HTML pages and graphics do not contribute to server latency as much as dynamic content like ASPs or other content that requires database lookups. Even when a Web server can deliver a large number of ASPs per second, the turnaround time per ASP can be unacceptable.

In general, reasonable user latency is as follows:

Home page: 1 second
Catalog page: 3 to 7 seconds
Credit card verification: 15 to 30 seconds

Figure 19.3 illustrates the latency experienced by users of a four-processor Web server as the number of users and ASP requests increases.

Cc936697.f19csrk03(en-US,CS.10).gif

Figure 19.3 ASP requests per second versus latency

The capacity of the site in this example is between 700 and 800 concurrent users per second. The latency rises to unacceptable levels when the number of users exceeds 800. This server's performance peaks at just over 16 ASP requests per second. At that point, users are waiting approximately 16 seconds for their pages, due to extensive context switching.

When you compile a list of shopper operations by viewing Internet Information Services (IIS) 5.0 logs or usage analysis reports, it is important to recognize that one shopper operation can be composed of multiple ASP requests, which the log lists as separate entries. For example, the log might list five separate shopper operations needed to complete a purchase, as shown in Figure 19.4.

Cc936697.f19csrk04(en-US,CS.10).gif

Figure 19.4 Shopper operations needed to complete a purchase

A shopper operation can also generate more than one ASP request. Figure 19.5 shows that adding a product to the shopping basket generates two ASP requests. The first ASP processes the add request, and the second ASP displays the contents of the shopping basket. The first ASP calls the second ASP, using a Server.Transfer command.

Cc936697.f19csrk05(en-US,CS.10).gif

Figure 19.5 ASP requests for adding a product to the shopping basket

If you want to convert a list of the most commonly performed ASP requests into a list of the most commonly performed shopper operations, you need to do some investigation. ASP requests are recorded in IIS log files and can be used to identify shopper operations. By looking at the log files after completing an operation, you can find the related requests.

Calculating Cost per User

You can use TCA to calculate the processing cost for each concurrent user. You measure cost by creating a load-monitoring script to calculate each of the identified shopper operations. You can then measure the resource utilization level at that specific load level. Or, instead of creating your own script, you can use the Microsoft Web Application Stress (WAS) tool to monitor load. (For more information about creating scripts and using WAS, see https://www.microsoft.com/technet/archive/itsolutions/intranet/downloads/webstres.mspx.)

The objective of running a script exclusively for an individual operation is to load the IIS/ASP server with as many requests as possible, in order to achieve optimal ASP throughput per second. Optimal ASP throughput occurs when you measure a drop in ASP throughput with a higher shopper load or when you measure a sudden increase in operation latency or ASP requests queued.

WAS can be integrated with System Monitor to simplify test data collection by simply adding the following counters:

Active Server Pages: Requests per second
System: % Processor Time

The following table provides a list of important counters to monitor.

Counter	Measures
Active Server Pages: Request wait time	Length of time that requests wait to be processed. This should be very close to zero.
Active Server Pages: Requests executing	Number of requests executing simultaneously. There should be only one request executing at a time.
Active Server Pages: Requests per second	Rate at which the ASPs are processing requests.
Active Server Pages: Requests queued	Number of requests waiting for service from the queue. If the number fluctuates considerably during stress and processor utilization remains relatively low, this could be an indication that the script is calling a Component Object Model (COM) object that is receiving more calls than it can handle.
CS2000: AuthManager: AuthMgr Objects/sec	Number of Authentication Manager objects created per second. It is useful to track this number because object creations are costly in terms of performance.
CS2000: Catalog: Catalog Queries per second	Number of queries made to the catalog system per second. The catalog query rate is the uncached rate. If this rate is high, you should change the application code to take advantage of a local caching mechanism, such as the LRUCache object.
CS2000: UPM: No. of Cache Purges	Number of times the foreground thread purged entries from the profile object cache to search for a free block of memory. If this rate is greater than zero, you should increase the amount of space allocated in the Global.asa file.
Memory: Available bytes	Total physical memory available to the operating system. This amount of available memory is compared with the memory required to run all of the processes and applications on your server. Try to keep at least 10 percent of memory available for peak use. Keep in mind that, by default, IIS 5.0 uses up to 50 percent of available memory for its file cache, leaving the rest of the memory available for other applications running on the server.
Memory: Page faults per second	Memory bottleneck due to page faults. If a process requests a page in memory and the system cannot find it at the requested location, this constitutes a page fault. If the page is elsewhere in memory, it is called a soft page fault. If the page must be retrieved from disk, it is called a hard page fault. Most processors can handle large numbers of soft page faults without consequence, but hard page faults can cause significant delays. If the number of hard page faults is high, you might have dedicated too much memory to the caches, not leaving enough memory for the rest of the system. Sustained hard page fault rates of over five per second are a key indicator of not having enough RAM. Try increasing the amount of RAM on your server or lowering cache sizes. Other counters that can indicate a memory bottleneck are Memory:Pages input/sec, Memory:Page Reads/sec, and Memory:Pages per second.
Memory: Pages per second	Number of pages retrieved per second. The number should be less than one per second.
Network Segment: Bytes received per second	Number of bytes received per second in a network segment. If a network card approaches its maximum capacity, add another.
Physical Disk: Avg. Disk Queue Length	Average disk queue length. If the disk is not fast enough to keep up with read and write requests, requests will queue up. Acceptable queue length is a function of the number of spindles in the array. Other counters that can be used to observe disk traffic include Physical Disk: Disk Reads/second and Physical Disk: Disk Writes/second. If necessary, consider adding more physical drives, such as a Redundant Array of Inexpensive Disks (RAID) system, to increase the number of spindles that can read and write, as well as to increase data transfer rates.
Physical Disk: Disk Reads/second Physical Disk: Disk Writes/second	Number of disk reads and writes per second on the physical disk. Combined, these two counters should be well under the maximum capacity for the disk device. To enable this counter, run diskperf –y from the command shell and reboot the computer.
Physical Disk: % Disk Time	Percentage of elapsed time that the selected disk drive is busy servicing read or write requests. Together with the Physical Disk: Avg. Disk Queue Length counter, this is a key indicator of a disk drive bottleneck. Note that the percentages for this counter can vary, depending on which storage solution you use. See the documentation for your storage solution for more information.
Process: Inetinfo: Private bytes	Current number of bytes this process has allocated that cannot be shared with other processes. If system performance is degrading over time, this counter can be a good indicator of memory leaks.
Process: Thread Count: dllhost	Number of threads created by the pooled out-of-process application (the most recent value).
Process: Thread Count: dllhost#1, #2, U, #N	Number of threads created by the isolated out-of-process application (the most recent value).
Process: Thread Count: Inetinfo	Number of threads created by the process you're monitoring (the most recent value).
SQL Server: Cache Hit Ratio	Percentage of time that SQL Server finds data in its cache, rather than having to go to disk. To give SQL Server more RAM, use the sp_configure stored procedure or the SQL Server Enterprise Manager (Sqlew.exe).
SQL Server: I/O transactions/sec	Amount of activity the SQL Server actually performs.
SQL Server - Locks: Total Blocking Locks	Number of blocking locks. A high blocking lock count can indicate a database problem.
System: Context Switches/sec	Context switches per second. If this number is too high, add another system.
System: % Processor Time	Percentage of time that processors are working. When this counter is running consistently between 80 and 100 percent, it is a key indicator of a CPU bottleneck.
System: Processor Queue Length	Instantaneous count (not an average) of the number of threads waiting in the queue shared by all processors in the system. A sustained value of two or more threads indicates a processor bottleneck.
Thread: Context Switches: sec: Inetinfo =>Thread#	Maximum number of threads per processor, or thread pool. Monitor this counter to make sure you are not creating so many context switches that the memory you are losing negates the benefit of added threads. At that point, your performance will decrease rather than improve. Anything over 5,000 context switches per second per server is probably excessive.
Thread: % Processor Time: Inetinfo =>Thread #	Amount of processor time each thread of the Inetinfo process uses.
Web: Total connections	Number of users.

Calculating Cost per User for CPUs

You must measure hardware capacity on all types of servers in your system if you want an accurate picture of your cost per user. To calculate the performance cost per user for a CPU, use a tool like WAS to simulate a load on the server. Increasing the number of simulated users increases the load. In turn, increasing the number of threads increases the number of simulated users. These threads will be spread among the client servers configured in WAS. If the number of threads becomes too much for the clients to handle (for example, if you specify 200 threads, but have only five client computers), change the number of sockets per thread. For example, 40 threads at five sockets per thread have the same effect as 200 threads. This will simulate 200 concurrent users.

ASP requests per second and CPU use grow with the number of users. However, when CPU use reaches 100 percent, adding more users results in lower ASP requests per second. Therefore, the number of ASP requests processed per second at the point at which CPU use reaches 100 percent is the maximum number of ASP requests your site can handle.

Before you can calculate the cost of a shopper operation, you must know the number of ASP pages used in the operation. For example, checkout operations typically involve several ASP pages, such as a shopper information page, credit card page, shipping page, confirmation page, and so on. Remember to account for ASP pages that users never see, such as action pages, because they are usually posted to and redirected to a continuing page. Make sure to include these "hidden" pages in the WAS script for that operation, or simply use RECORD mode in WAS to record these pages.

You calculate the cost of a shopper operation by multiplying the number of ASP pages by the cost per ASP page. This calculation is based on megacycles (Mcycles). The Mcycle is a unit of processor work. One Mcycle is equal to one million CPU cycles. As a unit of measure, the Mcycle is useful for comparing performance between processors because it is hardware-independent.

Note The following examples illustrate tests run on a dual-processor server. Adding processors changes the way threading and context switching is handled. For this reason, the results of these equations, such as 0.5624 for the CPU Cost per User equation and the Upper CPU Boundary equation can change if you add processors.

Do not extrapolate the numbers in these examples to predict how a quad-processor system would perform. Different applications scale differently across multiple processors, so these numbers cannot be re-used for other applications and/or other system configurations.

For example, a dual-processor 400 MHz Xeon Pentium II has a total capacity of 800 Mcycles. Using the maximum number of ASP requests per second, you can calculate the cost per ASP request (operation), as follows:

C = U * N * S / A * B

Where:

C = Cost per operation (cost for all files in the WAS script) 
U = CPU utilization (by percentage) 
N = Number of CPUs 
S = Speed of CPU (in MHz) 
A = ASP requests per second 
B = ASP requests per operation

For example, if you have a server with two CPUs, and if the browse operation results in 11.50 ASP requests per second with CPU utilization of 84.10 percent, the cost per ASP page is then 84.10% * 2 * 400 / 11.50 = 58.50 Mcycles.

The result of TCA measurement is a set of CPU costs for each shopper operation, as shown in the following table.

Shopper operation	Optimum ASP throughput (ASP requests per second)	Percentage of CPU (at optimum ASP throughput)	CPU cost per ASP (Mcycles)	ASP requests per operation	CPU cost per operation (Mcycles)
Add Item	23.31	96.98	33.29	2	66.57
Add Item + Checkout	18.48	94.31	40.82	7	285.74
Add Item + Delete	22.29	95.86	34.40	4	137.61
Basket	16.81	91.73	43.64	1	43.64
Default (home page)	102.22	98.01	7.67	1	7.67
Listing	21.49	91.87	34.21	1	34.21
Lookup	75.40	99.52	10.56	2	21.12
New	65.78	96.61	11.75	2	23.50
Product	18.23	94.81	41.61	1	41.61
Search	37.95	95.11	20.05	2	40.10
Welcome	148.93	96.97	5.21	1	5.21

After you have the CPU cost for each operation, you must calculate how often these operations are performed. You can derive this from a typical usage profile to get the CPU cost per user.

You can calculate CPU usage for each shopper operation by multiplying the CPU cost per operation by transaction frequency in operations per second. The result is the CPU usage (in MHz) for each shopper operation:

CPU usage = (CPU cost per operation) * (operations per second)

The following table shows sample results from this calculation.

Shopper operation	CPU cost per operation (Mcycles)	Operations per second	CPU usage (Mcycles)
Add Item	66.57	0.00033	0.0222
Add Item + Checkout	285.74	0.00003	0.0079
Add Item + Delete	137.61	0.00006	0.0076
Basket	43.64	0.00104	0.0455
Default (home page)	7.67	0.00139	0.0107
Listing	34.21	0.00347	0.1188
Lookup	21.12	0.00104	0.0220
New	23.50	0.00035	0.0082
Product	41.61	0.00583	0.2427
Search	40.10	0.00174	0.0696
Welcome	5.21	0.00139	0.0072
Average CPU usage per shopper (Mcycles)			0.5624

The total (0.5624 Mcycles per user) is the performance cost of an average user performing the shopper operations described in the usage profile. You can use this number to estimate the capacity of your site, based on the assumed usage profile. For example, the cost of 100 concurrent users is 100 * 0.5624 = 56.24 Mcycles.

The Product operation has a relatively high cost, at 41.61 Mcycles. The frequency of the Product operation is very high, at 0.00583, so the Product operation places a large load on the site. The cost of the Product operation per shopper operation shows a high number of 0.2427 Mcycles out of 0.5624 Mcycles (approximately 43 percent of the cost of the entire usage profile).

The Listing operation shows a moderate cost, at 34.21 Mcycles.

The Add Item + Checkout operation shows a heavy cost, at 285.74 Mcycles. However, because its frequency is so low (0.00003), it places a small load (0.0079 Mcycles) on the site. This is only approximately 1 percent of the cost of the entire usage profile. Therefore, although classified as a "heavy" operation, the Add Item + Checkout operation places a relatively light load on the server.

Thus, the numbers for this example suggest that it would be best to start optimizing performance with the Product and Listing operations to improve capacity.

The sum of the CPU usage for each shopper operation in a single session is equal to the average CPU usage per user. This makes it possible to calculate CPU usage for any given number of users using the following equation:

C = Min [(N * K), M]

Where:

C = CPU usage (in MHz)  
Min = Minimum value taken from within the brackets 
N = Number of users 
K = CPU usage per user (in MHz) 
M = Upper CPU boundary (maximum CPU usage)

You calculate the upper CPU boundary to anticipate the ceiling for CPU utilization and user capacity at a value below 100 percent (2 * 400 MHz CPU, or 800 MHz). Ideally, the CPUs are fully utilized when you reach user capacity. However, usually you reach user capacity even though CPU usage is below maximum, so you have to calculate the upper CPU boundary.

For example, if the CPU usage per user is 0.5624 MHz and the upper CPU boundary is 526 MHz, then the CPU usage for 100 users is 56.24 MHz:

C = Min [(100 shoppers * 0.5624 MHz), 526 MHz] 
C = Min [56.24 MHz, 526 MHz] 
C = 56.24 MHz

CPU usage for 935 shoppers would be 500 MHz, which also happens to be the upper CPU boundary:

C = Min [500 MHz, 526 MHz] 
C = 500 MHz

Shopper loads higher than 935 exceed the upper CPU boundary. This means that at that point, demand exceeds user capacity.

You can calculate the upper CPU boundary from the weighted average of the percentage of the CPU utilization for each shopper operation, as shown in the following table. In this calculation, CPU measurements for each shopper operation are weighted, based on the distribution of shopper operations in the usage profile. The table shows that the upper CPU boundary (M) is 94.86 percent, or 759 MHz. The equation is as follows:

Weighted % CPU = % CPU at optimum ASP throughput * % of Total (from usage  
profile)

The following table shows the upper CPU boundary (2x400 MHz server) for the shopper operations.

Shopper operation	Percentage CPU at optimum ASP throughput	Percentage of total (from usage profile)	Weighted percentage CPU
Add Item	96.98	2.00	1.94
Add Item + Checkout	94.31	0.17	0.32
Add Item + Delete	95.86	0.33	5.73
Basket	91.73	6.25	0.16
Default (home page)	98.01	8.33	8.17
Listing	91.87	20.83	19.14
Lookup	99.52	6.25	6.22
New	96.61	2.08	2.01
Product	94.81	35.00	33.18
Search	95.11	10.42	9.91
Welcome	96.97	8.33	8.08
Upper CPU boundary (weighted average)			94.86

Plugging in values for CPU usage per shopper operation and upper CPU boundary yields the following equation:

Min [(N * 0.5624), 759]

Figure 19.6 illustrates the values for the previous equation.

Cc936697.f19csrk06(en-US,CS.10).gif

Figure 19.6 Projected CPU usage (2x400 MHz server)

Calculate the capacity as follows:

Concurrent users = CPU capacity / CPU cost per user

(However, the upper boundary of the CPU is at 94.86 percent of CPU capacity, so True capacity = Max capacity * 94.86%)

Concurrent users = (800 Mcycles * 94.86%) / 0.5624 Mcycles per user 
Concurrent users = 759 Mcycles / 0.5624 Mcycles per user 
Concurrent users = 1,350

These figures are the calculated or projected user capacity in relation to CPU power.

In tests, WAS scripts were run on one operation at a time, in order to weigh each operation separately. In a production environment, all operations are called together, creating a much more complex environment, and one in which caching and context switching can make a difference. For this reason, you must perform verification tests, in which all operations are stress-tested together, based on each usage profile.

Using WAS, a sample script was created to simulate shopper load levels in increments of 250 until usage exceeded capacity. Resource utilization and ASP performance monitored with System Monitor produced the results shown in the following table.

Shopper load	Percentage CPU utilization	Context switches per second	ASP requests per second	ASP request execution time (ms)	ASP request wait time (ms)
250	17.42	4,763	4.918	87.32	0.16
500	37.83	5,426	9.548	111.33	0.47
750	54.45	7,017	15.021	117.44	0.16
1,000	72.89	8,190	19.659	130.46	0.63
1,500	98.37	9,470	26.607	1619.63	4636.00

At 1,000 users, the ASP request wait time starts to reach one second.

Figure 19.7 shows the same results in a graph. Note that CPU usage increases in a linear fashion until maximum CPU usage (800 MHz) is reached.

Cc936697.f19csrk07(en-US,CS.10).gif

Figure 19.7 CPU usage and shopper load (2x400 MHz server)

If you cannot upgrade or add processors, there are two other steps you can take to improve CPU efficiency:

Add network adapters. If you have a multiprocessor system that does not distribute interrupts symmetrically, you can improve the distribution of the processor workload by adding one network adapter for every processor. Generally, you add adapters only when you need to improve the throughput of your system. Network adapters, like any additional hardware, have some intrinsic overhead. However, if one of the processors is nearly always active (that is, if the % Processor Time counter equals 100 percent CPU) and more than half of its time is spent servicing deferred procedure calls (DPCs) (if the % DPC Time counter is greater than 50 percent CPU), then adding a network adapter is likely to improve system performance. Adding a network adapter is a viable option, as long as the available network bandwidth is not already saturated.
Limit connections. Consider reducing the maximum number of connections that each IIS 5.0 service accepts. Limiting connections can result in blocked or rejected connections, but it helps ensure that accepted connections are processed promptly.

Calculating Memory Cost per User

Because memory usage relates directly to the content of the site (caching, out-of-process dynamic-link libraries (DLLs), and so on), instead of to the number of concurrent users, you must calculate costs carefully. To calculate memory costs, you should monitor the following:

Amount of Inetinfo that is paged out to disk (if any)
Memory usage during site operation
Efficiency of cache utilization
Number of times the cache is flushed
Number of page faults that occur
User Profile Management cache
Cache manager for discounts and advertisements

You can specify whether the CacheManager object should be a Dictionary object or an LRUCache object. The Global.asa file in the Retail Solution Site, available for download from https://www.microsoft.com/commerceserver/solutionsites , contains sample code for the CacheManager object.

Internet Information Services (IIS) 5.0 runs in a pageable user-mode process called Inetinfo.exe. When a process is pageable, the system can remove part of or all of it from RAM and write it to disk if there isn't enough free memory.

If part of the Inetinfo process is paged to disk, the performance of IIS 5.0 suffers. It's very important to make sure that your server or servers have enough RAM to keep the entire Inetinfo process in memory at all times because the Web, File Transfer Protocol (FTP), and Simple Mail Transfer Protocol (SMTP) services run in the Inetinfo process. Each current connection is also given about 10 KB of memory in the Inetinfo working set. The working set of the Inetinfo process should be large enough to contain the IIS object cache, data buffers for IIS 5.0 logging, and the data structures that the Web service uses to track its active connections.

You can use System Monitor to monitor the working set of Inetinfo.exe. In addition to the performance counters listed in the previous "Calculating Cost per User" section, you should also monitor the Inetinfo counters listed in the following table.

Counter	Measures
Memory: Page Reads/sec	Hard page faults. This counter displays the number of times the disk is read to satisfy page faults. It displays the number of read operations, regardless of the number of pages read in each operation. A sustained rate of five read operations per second or more can indicate a memory shortage.
Memory: Pages input/sec	Cost of hard page faults. This counter displays the number of pages read to satisfy page faults. One page is faulted at a time, but the system can read multiple pages ahead to prevent further hard faults.
Process: Inetinfo: Page faults/sec	Hard and soft faults in the working set of the Inetinfo process.
Process: Inetinfo: Working set	Size of the working set of the process, in bytes. This counter displays the last observed value, not an average.

You should log this data for several days. You can use performance logs and alerts in System Monitor to identify times of unusually high and low server activity.

If the system has sufficient memory, it can maintain enough space in the Inetinfo working set so that IIS 5.0 rarely has to perform disk operations. One indicator of memory sufficiency is the amount the size of the Inetinfo process working set varies in response to general memory availability on the server.

You can use the Memory:Available bytes counter as an indicator of memory availability and the Process:Inetinfo:Working set counter as an indicator of the size of the IIS 5.0 working set. Make sure to examine data collected over time, because these counters display the last value observed, rather than an average.

When you look at page faults, compare your data on the size of the Inetinfo working set to the rate of page faults attributed to the working set. You can use the Process: Inetinfo:Working set counter as an indicator of the size of the working set, and the Process: Inetinfo:Page faults/sec counter to indicate the rate of page faults for the IIS 5.0 process. When you have reviewed data on the varying size of the Inetinfo working set, you can use its page fault rate to determine whether the system has enough memory to operate efficiently. If the system cannot lower the page fault rate to an acceptable level, you should add memory to improve performance.

IIS 5.0 relies on the operating system to store and retrieve frequently used Web pages and other files from the file system cache. The file system cache is particularly useful for servers of static Web pages, because Web pages tend to be used in repeated, predictable patterns.

If cache performance is poor when the cache is small, use the data you have collected to deduce the reason that the system reduced the cache size. Note the available memory on the server and the processes and services running on the server, including the number of simultaneous connections supported.

When you add physical memory to your server, the system allocates more space to the file system cache. A larger cache is almost always more efficient, but each additional megabyte of memory becomes increasingly less efficient than the previous one. You must decide at what point adding more memory produces so little improvement in performance that it ceases to be worthwhile.

Calculating Cost per User for Disks

IIS 5.0 writes its logs to disk, so there is usually some disk activity, even when clients are hitting the cache 100 percent of the time. Under ordinary circumstances, disk activity (other than that generated by logging) serves as an indicator of issues in other areas. For example, if your server needs more RAM, you'll see a lot of disk activity because there are many hard page faults. There will also be a lot of disk activity if your server houses a database or your users request many different pages.

Because IIS caches most pages in memory, the disk system is rarely a bottleneck as long as the Web servers have enough installed memory. However, SQL Server reads and writes to the disk frequently. SQL Server also caches data, but uses the disk a lot more than IIS. You should test disk activity on all servers if disk activity could become a bottleneck.

To measure the disk activity of a site, use System Monitor to record the Physical Disk: Disk Reads/second and % Disk Time counters while a WAS script is running for each shopper operation, such as when calculating the cost per user for the CPU. (The WAS tool cannot report the activity on the SQL Server server if it is running on a different server. If that is the case, use System Monitor instead.)

In our example in this chapter, the percent of disk utilization is based on a calibration of a maximum of 280 random seeks per second. For example, if the Web server generates 2.168 Add Item operations, then the SQL Server server performs 9.530 disk seeks per second (for a disk utilization of 3.404 percent). Calculate disk cost per user by dividing disk seeks per second by operations per second (which you will have determined as part of the usage profile). The equations are as follows:

Disk read cost per operation = disk reads per second / operations per second 
Disk write cost per operation = disk writes per second / operations per second

The following table lists the results of calculating disk reads on the site in our example. (Remember that you must also measure disk writes.)

Shopper operation	Operations per second	Disk seeks per second	Percentage of disk	Disk cost
Add Item	2.168	9.530	3.404	4.395
Add Item + Checkout	0.903	19.688	7.031	7.266
Add Item + Delete	9.384	8.956	3.199	0.954
Basket	8.728	7.050	2.518	0.808
Browse	6.033	0.103	0.037	0.017
Default (home page)	28.330	0.248	0.089	0.009
Listing	5.533	0.148	0.053	0.027
Lookup	12.781	0.063	0.023	0.005
New	12.196	9.275	3.313	0.760
Search	8.205	0.100	0.036	0.012
Welcome	31.878	0.080	0.029	0.003

Note This table reflects a busy server with a higher frequency of operations than in some of the other tables in these examples. The disk cost is still the same.

After you calculate the cost, the next step is to calculate the average load per user per second, as shown in the following table.

Shopper operation	Ratio of hits (percent)	Usage profile operations (over 11 minutes, 660 seconds)	Usage profile operations per second (usage profile operations / 660 seconds)	Cost per operation per second (read + write)
Add Item	1.76	0.2	0.000293	4.395
Add Item + Checkout	1.10	0.1	0.000183	7.266
Add Item + Delete	1.07	0.1	0.000178	1.274
Browse	36.61	4.0	0.006102	0.017
Default (home page)	22.82	2.5	0.003804	0.009
Login	1.73	0.2	0.000288	0.012
Register	1.06	0.1	0.000176	3.295
Search (Good)	14.35	1.6	0.002391	0.012
Search (Bad)	1.02	0.1	0.000170	0.012
View Cart	2.53	0.3	0.000421	0.027
Total		11.0		16.319 KBps

These calculations yield a load per user per second of 16.319 kilobytes per second (KBps). You can use this number to determine the capacity of the disk system.

Figure 19.8 shows disk seeks climbing to 4.38 seeks per second for a projected peak load of 400 users. Given that disk performance for the SQL Server server was calibrated at 280 random seeks per second, this translates to a disk utilization of 1.56 percent.

Cc936697.f19csrk08(en-US,CS.10).gif

Figure 19.8 Projected disk costs versus shopper load

You can draw the following conclusions from the calculation results in the previous tables:

You can use the results with the following equation to determine how many disk spindles are required for your system:

Disk spindles required = (disk cost per user per second / disk maximum  
reads per second) + (disk write cost per operation / disk maximum writes  
per second)

On multiple disk RAID arrays, the average disk queue length per array should not exceed the number of physical disks per array. If it does, this indicates a bottleneck.

Adding disk spindles usually means adding another disk to the RAID system. This ensures that there are enough disks to distribute the load between them efficiently.

You must calibrate the disk subsystem to determine the maximum number of reads and writes per second for an individual disk. The disk calibration process performs a large number of uncached reads and writes to the disk to determine the maximum number of reads and writes that the disk array can support. The maximum numbers of reads and writes are functions of disk seek time and rotational latency.

Ultimately, these calculations reveal that the cost per user for a disk in our sample is 0.00360763 KB per user per second, and the site capacity for a disk is 25,600 concurrent users per SQL Server server, based on the test platform's single hard drive.

Calculating Cost per User for Networks

Network bandwidth is another important resource that can become a bottleneck. You can calculate total network cost from the sum of the costs of the individual shopper operations. However, two network costs are associated with each shopper operation—the connection between the Web client and the Web server, and the connection between the SQL Server server and the Web server.

Note On a switched Ethernet LAN, traffic is isolated, so network costs are not added together. On an unswitched Ethernet LAN, network traffic is cumulative, so network costs are added together.

When a shopper performs an operation, the action generates network traffic between the Web server and the Web client, as well as between the Web server and the SQL Server server (if the SQL Server database needs to be accessed).

The Add Item operation, for example, shows that optimal throughput is 2.168 operations per second. The network cost of Add Item* *is 5.627 KBps per operation between the Web client and the Web server and 129.601 KBps between the Web server and the SQL Server server. Most of the traffic generated by the Add Item operation is between the Web server and the SQL Server database. The following table shows the combined net Web cost and net SQL Server cost as the net total cost of each operation.

Shopper operation	Net Web cost	Net SQL Server cost	Net total cost
Add Item	5.627	129.601	135.23
Add Item+Checkout	24.489	55.215	79.70
Add Item+Delete	10.763	5.392	16.16
Basket	2.750	4.010	6.76
Default (home page)	1.941	0.000	1.94
Listing	25.664	23.134	48.80
Login	17.881	1.380	19.26
Lookup	14.475	0.861	15.34
Main	24.437	9.503	33.94
New	18.859	0.492	19.35
Product	21.548	21.051	42.60
Search	20.719	10.725	31.44

Net Web cost represents the bytes transmitted per operation between the Web client and the Web server.

Net SQL Server cost represents the bytes transmitted per operation between the SQL Server server and the Web server.

Net total cost represents the total bytes transmitted per operation on an unswitched Ethernet LAN, where costs are added together. On a switched Ethernet LAN network, costs are separate because the segments are isolated.

The following table shows the total bytes transmitted per operation (total network cost per user per second) on an unswitched Ethernet LAN.

Shopper operation	Ratio of hits (percent)	Usage profile operations per second	Net Web cost	Net SQL Server cost	Cost per user per second (Web server)	Cost per user per second (SQL Server)	Total cost per user per second (added together, unswitched LAN)
Add Item	1.76	0.000293	5.627	129.601	0.001649	0.037973	0.039622
Add Item + Checkout	1.10	0.000183	24.489	55.215	0.004481	0.010104	0.014586
Default (home page)	22.82	0.003804	1.941	0.000	0.007384	0	0.007384
Listing	2.53	0.000421	25.664	23.134	0.010805	0.009739	0.020544
Login	1.73	0.000288	17.881	1.380	0.00515	0.000397	0.005547
Product	36.61	0.006102	21.548	21.051	0.131486	0.128453	0.259939
Register	1.06	0.000176	5.627	129.601	0.00099	0.02281	0.0238
Search (Bad)	1.02	0.000170	20.719	10.725	0.003522	0.001823	0.005345
Search (Good)	14.35	0.002391	20.719	10.725	0.049539	0.025643	0.075183

The following table illustrates an unswitched network, at 0.459729 KBps per user (total network traffic escalation).

Number of users	Total cost per user per second	Total network traffic (KBps)
100	0.459729	45.9729
200	0.459729	91.9458
300	0.459729	137.9187
400	0.459729	183.8916
500	0.459729	229.8645
600	0.459729	275.8374
700	0.459729	321.8103
800	0.459729	367.7832
1,000	0.459729	459.729
1,200	0.459729	551.6748
10,000	0.459729	4,597.29
20,000	0.459729	9,194.58
100,000	0.459729	45,972.9

Even in an unswitched network, the traffic on the network is low. However, this can still cause a potential bottleneck, because it is possible to have many servers on the same network hosting the site.

If the network is a Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Ethernet network running at 100 megabits per second (Mbps), or 12.5 megabytes per second (MBps) (100 megabits / 8 bytes per bit), then collisions will cause network congestion. For this reason, you should not push network utilization over 36 percent, which means no more than 4.5 MBps on the network. The network illustrated in the previous table reached the 4.5 MBps threshold at about 10,000 users, which is the site's capacity. At 20,000 users, the network will become congested due to excessive collisions, and will therefore cause a bottleneck. To add capacity, you can move to a switched network, or at least separate the Web network traffic from the network traffic on the SQL Server server.

Note Remember to measure network traffic for the entire site and not just for individual servers.

There are two primary flows of network traffic to consider in a typical site: Web client to Web server and Web server to SQL Server server. Sites that are more complex can have more flows, depending on the number of servers and the architecture of the site.

Network capacity can become a bottleneck to your site as it grows, especially on sites where the ASP content is relatively simple (low CPU load) and the content (like static HTML or pictures) is relatively large. A few servers can easily serve the content to thousands of users, but the network might not be equipped to handle it. Most of the traffic on the network flows between the Web server and the SQL Server server.

Finally, these examples reveal that the cost per user for the network in the site in this example is 0.459729 KB per user per second, and the site capacity for the network is 10,000 concurrent users, based on a 100 Mbps unswitched network.

Managing Performance

Managing the performance of your Commerce Server 2000 site largely consists of finding and removing bottlenecks. A bottleneck is hardware or software that is operating at maximum capacity. As the load approaches maximum capacity, the bottleneck begins to restrict the flow of work through the system.* *Performance tools can help you determine what hardware or software has reached its limit. You can then improve the hardware, change the configuration, or tune the software to improve overall performance.

Performance is only one factor in developing your Commerce Server site. Other important factors include ease of development and maintenance, time to market, availability of good programming tools, and in-house site developer expertise. Optimizing for performance can affect any of these other factors.

Web development is driven by a business case that determines priorities. For example, project goals might specify a particular programming language or data-access technology, and such decisions always affect performance to some degree. It is very important to determine the necessary level of performance appropriate to your Web site, then develop the site and manage it to that level of performance.

Web sites often run on multiple physical tiers, each of which has its own hardware, system software, and application software. As a result, Web applications can have many types of performance problems: hardware (client computer, Web server, database server, the network), system software (operating systems, networking software, system services), client applications, browsers, logical database, physical database, data access, and so on.

You can use the following questions to help determine your performance tuning goals:

Will the performance of this site meet our goals today and in the future?
What hardware and software configuration do we need to meet our performance goals?
Will the site run on our existing hardware and software configuration?
Can we expect our current configuration to become a bottleneck?
How many users can our site support?
What will it cost to develop this site (hardware, software, and development)?

Monitoring performance regularly is the only way to be sure that the site is meeting its specified performance goals. Regular performance monitoring can also provide an early warning when a change degrades performance. You can collect performance data using existing system tools, having the site monitor report on its own performance, or by building special client applications to drive the system.

When you tune performance, you should measure system performance first to see if it meets your goals. If performance doesn't meet your goals, find the bottleneck, remove it, and then repeat the process. Remember to stop when you reach your performance goal. You can always increase performance further, but when site performance meets your goals, additional tuning is generally not cost-effective.

This section describes a performance tuning methodology that you can use repeatedly. You should manage and document your performance tuning process carefully by working systematically and according to your plan.

Identifying Site Constraints

Management uses a business case to determine priorities for developing your Commerce Server site. Often there are higher priorities than just the maximum-possible performance. These constraints cannot be altered in search of higher performance. For example, one business requirement might be that no changes should impact maintainability or time to market, even if those changes might improve performance. As a result, performance work must focus on factors that are not constrained.

Hardware is one factor that can often be changed to improve performance. Buying bigger and faster servers or using more servers and partitioning the load can be cost-effective ways to improve performance. If you plan to add more servers, you need to design the site accordingly.

Another alternative is to tune other parts of the system. The database is a critical factor in overall system performance. Designing an efficient, logical database and tuning the physical database are crucial to achieving good performance.

If site performance still falls short of your goal, ask yourself the following questions:

Should we use a different programming language or a different data-access technology?
Can the database be housed on a separate server?
Can more stateless components be used?

Defining Load

You can analyze usage log files to determine the load factors in the following table.

Load factor	Description
Number of concurrent users	Number of users visiting your site at the same time.
Time between operation calls (or think time)	Average delay between the time when a user receives one reply and submits another request. For example, if the operation is called only once every two seconds, the operation only has to be faster than two seconds in order to perform as needed.
Number of ASP requests per operation	Number of ASP requests in an operation. Performance must be measured in operations per second, but System Monitor measures only ASP requests per second. To know how many ASP requests are in an operation, either review the ASP code or analyze a Network Monitor capture file of a single operation.
Number of static pages versus number of dynamic pages	Number of static and dynamic pages in your application.
Number of secure pages versus number of non-secure pages	Number of secure pages and pages containing no security in your application.
Variation in load over time	Difference between the average load and peak load.

Setting Performance Goals

After you define site constraints, services the site provides, and demand for those services, you can set specific performance goals for your site.

First, you should choose metrics for evaluating the performance of your site. One common metric is total system throughput. Throughput is often expressed as ASP requests per second. You should measure throughput per operation, because each operation has its own inherent value.

Another common metric is required response time. Response time is the time between the submission of a request and the receipt of the reply. Response time requirements are often expressed by specifying the ninety-fifth percentile. For example, a required response time of one second means that 95 percent of client calls must return in less than one second.

Next, you must choose required values for your metrics. Setting explicit performance goals is the key step in tuning system performance. The result of this step is to determine how many operations or transactions your site must support per second. After you have chosen required values and set specific goals, you iterate through a series of controlled performance tests until you reach your goals. Use the procedures in the following table to tune system performance.

Procedure	Description
Measure application performance on the target platform	If performance equals your goal, you are done. When you reach your goal, stop tuning performance. (You should, however, continue to monitor performance to be sure that it continues at a satisfactory level.) If performance does not equal your goal, go on to the next procedure.
Find the bottleneck	If performance does not equal your goal, use performance monitoring tools to find the bottleneck. You don't have to find all of the bottlenecks at once. After you identify a few bottlenecks, determine which one will yield the biggest performance increase when fixed. Keep the cost of fixing the bottleneck in mind (hardware, software, and extra development) when determining which bottleneck to fix.
Fix the bottleneck	Form a hypothesis as to what is causing the bottleneck. Devise a fix for the problem. Apply the fix. This step is not always easy, of course. Sometimes performance tools do not clearly identify the problem. When that happens, you have to experiment with one factor at a time. The more you know about the site and the system, and the more experience you have managing performance, the better you will be at finding performance problems and determining the best solutions for them.
Repeat the tuning process	Make sure that the changes you made did not introduce new errors, and then repeat the tuning process. The only way to know if the change actually improved performance is to measure performance again. Sometimes you must undo a change because it had no effect or even made performance worse.

Measuring Performance

Measuring performance accurately can be extremely challenging due to the complex nature of systems. This section describes how to measure the performance aspects of:

Memory
Processor capacity
Network
Disk access
Database
Security
Bottlenecks
Optimization

For most sites, you can simply gather performance data using WAS to run scripts that constantly request the ASP pages for a specified operation. While the WAS scripts run, you can use System Monitor to monitor selected counters. (WAS can also capture these counters, but it displays them in text format, not graphically.) The best way to get an overview of system performance is to chart a set of System Monitor counters for every performance test. Choose counters that can indicate common bottlenecks, such as those listed previously in this chapter.

WAS is designed to simulate multiple browsers requesting pages from a Web site. This tool can realistically simulate many requests with relatively few client servers. However, you must be sure that you have an adequate number of client servers. Beyond a certain point, the overhead of context switching on a client server can influence the effectiveness of simulating a number of virtual users, giving skewed results.

Sometimes the only way to measure performance is to program hooks into the system to log performance metrics. (A hook is a location in a routine or program in which other routines can be inserted, in this case to log metrics.) Although these hooks can impact application performance, they can be helpful if it is critical for you to know a particular performance measurement.

It is hard to accurately identify bottlenecks from performance data, but there are some telltale signs that indicate bottlenecks. For example, if available memory falls below 4 MB, the system is probably accessing the disk too often. To solve this, add more memory.

Another indication of a bottleneck is fluctuations in the Active Server Pages: Requests queued performance counter, which indicates the number of requests waiting for service from the queue. If the requests queued fluctuate considerably during a stress test and processor utilization remains relatively low, this is an indication that the script is calling a server COM component that is receiving more calls than it can handle. In this case, the server COM component is probably the bottleneck.

The following table lists ways in which you might measure the performance of each element of the site.

Element	Description
Client	Use WAS to load or generate scripts that request an ASP page and measure the response time.
SQL Server	Use either Query Analyzer (analyzes individual query time) or SQL Profiler to measure response time for SQL Server. If SQL Server is the bottleneck, first try to optimize SQL Server itself, making sure it has enough memory, the configuration is optimized, you have the best indexes where they matter the most, and so forth. Sometimes, however, the only way to substantially improve performance is to optimize the design of the database, redesign queries, convert queries to stored procedures, and so forth.
Data access method (ActiveX Data Objects (ADO))	If ADO is the bottleneck, make sure that your application is using it correctly. If optimizing ADO still doesn't produce the performance improvements you're looking for, consider using the LRUCache object, instead.
COM	Use tools such as COM+ to measure the performance of COM objects.

Memory

Performance bottlenecks caused by memory shortages can often appear to be problems in other parts of the system, so you should monitor memory first to verify that your server has enough, then move on to other components. A dedicated Web server needs at least 128 MB of RAM to run Microsoft Windows 2000, IIS 5.0, and Commerce Server, but 256 MB to 1 GB is usually better. Since the IIS file cache is set to use up to half of available memory by default, the more memory you have, the larger the IIS file cache can be.

Note Microsoft Windows 2000 Advanced Server can support up to 8 GB of RAM, but the IIS file cache will not use more than 4 GB.

To determine whether you have enough memory on your server, use System Monitor to graphically display counter readings as they change over time. Also, monitor your cache settings. Adding memory alone won't necessarily solve performance problems. You need to be aware of IIS cache settings and how they affect server performance. If these settings are not appropriate for the loads placed on your server, the cache settings, rather than a lack of memory, can cause performance bottlenecks.

Processor Capacity

Processor bottlenecks occur when one or more processes consume most of the processor time, forcing other process threads to wait in a queue. IIS 5.0 scales effectively across two to four processors, providing more processor time. Consider the business needs of your Web site if you're thinking about adding more processors.

If you primarily host static content on your server, a two-processor computer is likely to be sufficient to prevent bottlenecks. If you host dynamically generated content, a four-processor setup might be sufficient. However, if the workload on your site is highly CPU-intensive, no single computer can keep up with requests. If this is the case, you should scale your site across multiple servers, using Network Load Balancing (NLB) or a hardware load balancer. If you already run your site on multiple servers, and you are still experiencing performance bottlenecks, consider adding more servers.

Networks

The network is the line through which clients send requests to your server. The time it takes for those requests and responses to travel back and forth is one of the largest limiting factors in user-perceived server performance. This latency is almost completely out of your control. There is little you can do about a slow router on the Internet or the physical distance between a client and your server, except possibly setting up geographically distributed Web servers.

On a site consisting primarily of static content, network bandwidth is the most likely source of a performance bottleneck. Even a fairly modest server can completely saturate a T3 connection (45 Mbps) or a Fast Ethernet connection (100 Mbps). You can mitigate the problem somewhat by tuning your network connection and maximizing your effective bandwidth.

The simplest way to measure effective bandwidth is to determine the rate at which your server sends and receives data. There are a number of performance counters that measure data transmission in many components of your server. These include counters on the Web, FTP, and SMTP services, the TCP object, the IP object, and the Network Interface object. Each of these counters reflects different Open Systems Interconnection (OSI) layers. The following table lists two of the main counters you should monitor to measure network bandwidth performance.

Counter	Measures
Network Segment: Bytes received per second	Bytes received per second on a segment of the network. Compare this counter to the total bandwidth of your network adapter card to determine whether your network connection is creating a bottleneck. To allow room for spikes in traffic, you should usually use no more than 50 percent of capacity. If this number is very close to the capacity of the connection, and processor and memory use are moderate, then the connection might be a problem.
Web: Maximum connections Web: Total connections	Maximum connections on the Web; total number of connections. Monitor these two counters to see whether your Web server is able to use as much of the connection as it needs, if you are running other services on the computer that also uses the network connection. Compare these numbers to memory and processor usage figures so that you can be sure that the connection is the problem, not one of the other components.

Network Segment: Bytes received per second

Bytes received per second on a segment of the network. Compare this counter to the total bandwidth of your network adapter card to determine whether your network connection is creating a bottleneck. To allow room for spikes in traffic, you should usually use no more than 50 percent of capacity. If this number is very close to the capacity of the connection, and processor and memory use are moderate, then the connection might be a problem.

Web: Maximum connections
Web: Total connections

Maximum connections on the Web; total number of connections. Monitor these two counters to see whether your Web server is able to use as much of the connection as it needs, if you are running other services on the computer that also uses the network connection. Compare these numbers to memory and processor usage figures so that you can be sure that the connection is the problem, not one of the other components.

Disk Access

Disk access is another common performance bottleneck, especially for database-intensive applications. Both Microsoft Distributed Transaction Coordinator (MSDTC) and SQL Server keep durable logs, and they must write their log entries to disk before they commit each transaction. When transaction rates are high, writing to these logs generates a lot of disk activity. It is often a good idea to provide dedicated disk drives for both logs.

Since IIS 5.0 also writes logs to disk, there is regular disk activity even with 100 percent client cache hits. Generally speaking, if there is high disk read activity other than logging, other areas of your system need to be tuned. For example, hard page faults cause large amounts of disk activity, but are indicative of insufficient RAM, not insufficient disk space.

Accessing memory is faster than accessing disks by a factor of roughly one million; so clearly, searching the hard disk to fill requests degrades performance. The type of site you host can have a significant impact on the frequency of disk seeks. If your site has a very large file set that is accessed randomly, if the files on your site tend to be very large, or if you have a very small amount of RAM, then IIS is unable to maintain copies of the files in RAM for faster access.

Typically, you should use the Physical Disk counters to watch for spikes in the number of disk reads when your server is busy. If you have enough RAM, most connections will result in cache hits, unless you have a database stored on the same server and clients are making dissimilar queries, which precludes caching. Be aware that logging can also cause disk bottlenecks. If there are no obvious disk-intensive issues on your server, but you see a lot of disk activity anyway, you should check the amount of RAM on your server immediately to make sure you have enough memory.

Database

To enhance database-driven performance in a production environment, use SQL Server. Both IIS and SQL Server perform best with plenty of memory, so try storing the database on a separate server from the Web service. Communication across computer boundaries is frequently faster than communication on a single computer. Also be sure to create and maintain good indexes to minimize input/output (I/O) on your database queries. Take advantage of stored procedures, which take much less time to execute and are easier to write than an ASP script designed to do the same task.

Hot spots in the database can become bottlenecks in the system. A hot spot occurs when many transactions are trying to access the same resource, such as an index, a data page, or a row at the same time. When that happens, many transactions get blocked, which reduces concurrency, and which in turn decreases system throughput. SQL Server prevents concurrency anomalies by using locks to protect data accessed in a transaction. The locks block other transactions that try to access the locked data area until the first transaction completes.

Security

Balancing performance with users' concerns about the security of your Web applications is one of the most important issues you will face, particularly if you have an e-commerce Web site. Since secure Web communication requires more resources than non-secure Web communications, it is important that you know when to use various security techniques, such as the Secure Sockets Layer (SSL) protocol or Internet Protocol (IP) address checking, and when not to use them. For example, your home page or a Search results page probably doesn't need to be accessed through SSL. However, a Checkout or Purchase page needs to be secure.

If you use SSL, remember that establishing the initial connection is five times as expensive as reconnecting using security information in the SSL session cache. The default timeout for the SSL session cache is five minutes in Windows 2000. After the cache is flushed, the client and server must establish a completely new connection. Make sure that you enable HTTP keep-alive connections (that is, persistent connections) because SSL sessions don't expire when used in conjunction with HTTP keep-alive connections unless the browser explicitly closes the connection.

The most common way to measure security overhead is to run tests comparing server performance with and without a security feature. You should run the tests with fixed workloads and a fixed server configuration, so that the security feature is the only variable. During the tests, you should measure the elements in the following table.

Element	Description
Processor activity and the processor queue	Authentication, IP address checking, SSL protocol, and encryption schemes are security features that require significant processing. If there are performance bottlenecks, you will probably see increased processor activity, both in privileged and user modes, and an increase in the rate of context switches and interrupts. If the processors are not sufficient to handle the increased load, queues will develop. Custom hardware, such as cryptographic accelerators address this problem.
Physical memory used	Security requires that the system store and retrieve more user information. Also, the SSL protocol uses long keys for encrypting and decrypting the messages (40 bits to 1,024 bits long).
Network traffic	Performance bottlenecks will probably result in an increase in traffic between the IIS 5.0 server and the domain controller used to authenticate logon passwords and verify IP addresses.
Latency and delays	The most obvious performance degradation resulting from complex security features like SSL is the time and effort involved in encryption and decryption, both of which use many processor cycles. Downloading files from servers using the SSL protocol can be 10 to 100 times slower than from servers that are not using SSL.

If a server is used both for running IIS 5.0 and as a domain controller, the proportion of processor use, memory, and network and disk activity consumed by domain services is likely to increase the load on these resources significantly. As a result, you should not run IIS 5.0 on a domain controller.

Bottlenecks

Finding bottlenecks or hot spots is sometimes more of an art than a science. The trick is to find a true bottleneck, and not just the symptom of a bottleneck. The suggestions in the following table can help you set up and track your performance testing.

Suggestion	Description
Check the functional correctness and performance of your application	You can get definitive performance results only when the application is functionally correct. Although you should consider performance implications during design and implementation, your application must work before you can start tuning performance. Keep in mind that you might have to redesign parts of the application to meet performance goals. Of course, any change can introduce errors, so test the correctness of the application whenever you make any changes.
Make the tests repeatable	Run the same tests both before and after any change, to measure the impact of the change. Use the same transaction mix, the same clients generating the same load, and so on. Keep the same hardware and the same software configurations. Run the same system services. Don't run applications, such as e-mail, because behavior might differ between test runs. If network traffic is a factor, you might have to test on a private network.
Take careful notes	Record the results of each test and any changes from previous tests. Successful performance tuning depends on working systematically. Keep a performance log and take written notes on each test run. Describe the configuration, especially any changes from the previous test. Record performance and the data gathered from performance monitoring tools.
Change only one factor at a time	If you change more than one factor in a single test, and performance changes, you won't be sure which factor impacted performance. Whenever possible, change only one factor each time you run a test.

Bottlenecks are unavoidable. There is always something preventing a system from operating at optimum performance levels. The trick is to find and remove the biggest bottlenecks that are also the easiest and least expensive to fix. For example, it might be simple to rewrite an ASP page, but it might not be as simple to rewrite a Microsoft Visual Basic COM object in Microsoft Visual C++. You must evaluate the potential performance gain against the cost of the work and the time necessary to do the optimization.

For example, if an operation is currently processing about 7 operations per second, but the goal is 10 operations per second, time must be decreased by 40 milliseconds per operation. Rewriting a Visual Basic COM object to a Visual C++ object is estimated to save 17 milliseconds; rewriting the ASP code is estimated to save 35 milliseconds; installing more memory on the SQL Server server is estimated to save 7 milliseconds. To save 40 milliseconds per operation, you can optimize the ASP code and put more memory on the SQL Server server. Both options are inexpensive compared to rewriting the COM object, even though that would give you even better performance. However, your priority should be to reach your performance goals, which you can do without rewriting the COM object.

Optimizations

The following table lists some elements you should consider optimizing for your site.

Optimization	Description
Optimize your code	Try to conserve memory, use objects for as short a time as possible, avoid expensive loops, and so forth. Remember that there is a difference between optimizing and rewriting code. Rewrite code only when optimization techniques don't give you the performance you need.
Optimize components	Make sure you call the components in the correct and most efficient manner, as late as possible, and release them as early as possible. Try to take advantage of connection pooling and/or component pooling (COM+) as much as possible.
Minimize or eliminate steps that cause bottlenecks	Try making dynamic pages static, refreshing the content on a scheduled basis. Many sites contain steps that are repeated often, such as retrieving headlines from SQL Server. Although such steps might be relatively cheap in processing cost, calling them frequently can make them bottlenecks. A popular way of solving this type of problem is to schedule a custom program to retrieve the headlines from SQL Server every 10 minutes and write the headlines into an HTML file, which can then be used by the site. This is a less expensive way of performing this operation, since SQL Server is now queried only every 10 minutes for the headlines, instead of hundreds of times per minute.
Convert slow-running ASP pages with ISAPI extensions	Internet Server Application Programming Interface (ISAPI) can be an extremely fast and efficient way to handle Web server requests. A code example for a basic checkout process is provided on the Commerce Server 2000 Resource Kit CD.
Optimize design and architecture	Good design can increase performance better than most other optimizations. Carefully consider how you could optimize the design and architecture of the following areas: · Site (Web servers, SQL Server servers, and other servers) · Database (partition the database and optimize the database schema) · Site logic (flow of users and data) · Content distribution (in a server farm environment) · Load distribution · Asynchronous Message Queuing or e-mail (for updating systems) · Commerce Server pipelines
Consider new bottlenecks	It is important to examine any proposed optimization in context before you do any work. Remember to consider the entire system as a whole, not just one ASP page or even one Web server. For example, creating a SQL Server index might result in a great performance gain for one operation (retrieving data), but could create a new bottleneck for a different operation (inserting data).
Consider new errors	Note all areas affected by changes in architecture, components, or code (directly and indirectly). It is not unusual to introduce a new error into a system when you make changes. All changes should be reviewed carefully to ensure that the system works the way you intended after the changes are made. Be sure to test the entire system, not just the part that changed.
Upgrade to larger L2 caches	If you add or upgrade processors, choose processors with a large secondary (L2) cache. Server applications need a large processor cache because their instruction paths involve many different components and they need to access a lot of data. A large processor cache (2 MB or more if it is external, up to the maximum available if it is on the CPU chip) will improve performance.
Upgrade to faster CPUs	Web applications particularly benefit from faster processors.
Use Expires headers	Set Expires headers on both static and dynamic content so that both types of content can be stored in the client's cache. This results in faster response times, less load on the server, and less traffic on the network. For example, you could create a header that won't download your company's logo file if the user has already visited your site. To set Expires headers for static content, use the HTTP Headers property sheet. To set Expires headers for dynamic content, use the Response.AddHeader method.
Enable ASP buffering	ASP buffering is on by default after a clean install of Windows 2000. However, if you have upgraded from Microsoft Windows NT 4.0, you might need to turn it on. ASP buffering collects all output from the application in the buffer before sending it across the network to the client browser. This reduces network traffic and response times. Although buffering reduces response times, users might have the perception that the page is slower and less interactive, because they see no data until the page has finished executing. Judicious use of the Response.Flush method can increase the perception of interactivity.
Reduce file sizes	You can increase the performance of your Web server by reducing file sizes. Image files should be stored in an appropriate compressed format. Limit the number of images and other large files whenever possible. You can also reduce file size by "tightening up" HTML and ASP code.
Store log files on separate disks and remove nonessential information	If your server hosts multiple sites, a separate log file is created for each site, which can cause a bottleneck. Avoid logging non-vital information and try storing logs on a separate partition or disk from your Web server.
Use RAID and striping	To improve disk access, use RAID and striped disk sets. Consider using a drive controller with a large RAM cache. If your site relies on frequent database access, move the database to a separate computer.

Case Study: MSNBC

MSNBC is the 24-hour cable and Internet joint venture of Microsoft and NBC News. MSNBC.com is the number one news site with the fastest growth in frequency of downloads for all of 1999. It was voted "best Web site and most interactive" by PC Magazine and rated "best news portal" by Yahoo! Internet Life. MSNBC.com delivers the best of NBC News, MSNBC Cable, CNBC, and NBC Sports. Expanded news and feature reporting is provided to MSNBC.com users through strategic partnerships with Ziff Davis, The Wall Street Journal, MSN Money Central, The Sporting News, Expedia, E! Online, PencilNews, FEED magazine, Oncology.com, and APB News. MSNBC.com is a provider of broadband content for Road Runner, Sprint High Speed DSL, and Excite@Home, and programs interactive television with MSNBC Cable and NBC News.

The MSNBC.com Web site is hosted on 48 Web servers, each of which contains 1 GB of RAM running Windows 2000 Server. Newer servers have four 500 MHz Pentium III processors and older servers have two 400 MHz Pentium II processors.

Each of the eight IP addresses in the Domain Name System (DNS) entry of the site is a virtual IP address that represents six identical Web servers (Figure 19.9). When a user makes a request of the MSNBC.com site, the site uses NLB to direct the request to one of the eight virtual IP addresses, which then redirects it to one of the six servers in the cluster. If a page contains a number of different elements, such as graphics or inline frames, it is possible that each element could come from a different server. Figure 19.9 shows the hardware infrastructure for MSNBC.com.

Cc936697.f19csrk09(en-US,CS.10).gif

Figure 19.9 MSNBC.com hardware infrastructure

Site administrators use NLB when they need to take down servers for repairs or upgrades without affecting traffic to the site. Because the site typically runs at about 50 percent of capacity, it can withstand the loss of several servers before users will notice performance degradation.

The Web pages on MSNBC.com contain a great deal of dynamic content. Servers assemble almost every page on the site from various databases each time a user makes a request. Figure 19.10 shows the MSNBC.com home page.

Cc936697.f19csrk10(en-US,CS.10).gif

Figure 19.10 MSNBC.com home page

This home page contains the following dynamic elements:

An ActiveX control that supplies the site's navigation menu on the left side of the page. First, a user clicks a link on the home page to go to a section of the site. Then, when the user points to a menu item on a subject page, a submenu appears, offering access to the stories and subsections within that section.
The latest Dow Jones Industrial Average figures, updated throughout the business day.
Several advertisements, retrieved from databases.
Local news and weather content tailored to the user.

A dynamic Web site like MSNBC.com puts a great deal of demand on its network and computing hardware and software. MSNBC uses ASP technology to serve stories and graphics. The Web servers communicate with four SQL Server servers on the back end that store data.

As a news site with a reputation for bringing news to readers quickly, MSNBC.com regularly experiences usage spikes during periods of significant, breaking news. Sometimes, site operators can anticipate these spikes, but at other times spikes are completely unexpected. Site operators must ensure that the site has enough capacity to handle sudden, unanticipated demand.

On a typical day, MSNBC.com runs at about 50 percent of total capacity. During times of high demand, it sometimes nears or exceeds capacity. When this happens, site operators have to take additional steps to handle the demand, such as decreasing the amount of dynamic content on the site.

Load Monitoring Tools

MSNBC uses performance monitoring tools (such as System Monitor) in Windows 2000 to monitor load and performance, including the number of concurrent users accessing the site. If any of the System Monitor counters routinely exceed the recommended baseline, site operators consider upgrading server hardware. The Active Server Pages: Requests queued counter is especially important, due to the high level of ASP content on MSNBC.com. If the number of requests in the ASP queue reaches approximately 300, site developers simplify or reduce ASP content or add more hardware.

Planning for the Future

As a rule, the production team performs growth planning every three months or so by looking at historical trends and upcoming events. By comparing page views for the current month with figures from the previous year, and taking into account such things as big news stories, they can spot growth trends. Sometimes site operators can make educated guesses about upcoming major growth periods and plan accordingly. For example, two events in the late summer and fall of 2000—the Olympic Games in Sydney, Australia and the U.S. presidential election—caused long periods of heavy, sustained use. MSNBC added capacity through the spring and summer to handle the increase. Adding partnerships and linking arrangements can also result in increased demand.

Although MSNBC is positioned to adequately handle current typical and peak levels of traffic, proper capacity planning requires attention to the future as well. MSNBC is considering several options to handle expected levels of growth.

Currently, MSNBC serves all Web traffic from its Canyon Park data center in Bothell, Washington. However, in the future, MSNBC plans to open a satellite data center in Santa Clara, California, to take advantage of the high-capacity Internet infrastructure and peering arrangements available in and around Silicon Valley. Over the long term, site operators plan to serve Web traffic from data centers on the East Coast as well, to better serve users in the eastern half of the United States.

Future plans might also include a caching scheme, such as a separate set of servers for graphics or a reverse proxy setup. A proxy server or servers might then be able to intercept all page requests from the Internet, and serve static content from their own disks or RAM, passing ASP requests on to existing server clusters. This approach could be used in combination with a distribution scheme, a central server that distributes requests to several reverse proxies around the United States.

A similar approach might be to use a caching service. MSNBC could contract with such a service to serve static content, such as graphics, from its own geographically dispersed network of servers.

Best Practices

MSNBC.com uses the following best practices to address capacity planning:

Page weight standards. Content developers for MSNBC try to stay between 150 and 200 KB per page, including all content and graphics. Graphics are responsible for most of the page size and loading time.
Tuning content. During periods of high demand, MSNBC takes steps to reduce the amount of dynamic content in order to increase the capacity of its servers. This is usually done by moving to a "light" version of the site in which many of the ASP pages are replaced by static HTML pages.

Tools

Microsoft offers a numbers of tools for performance tuning and testing. Some of these tools are included with Windows 2000 and IIS 5.0, others are offered on the Windows 2000 Resource Kit CD, and still others are available on the Microsoft Web site ( https://www.microsoft.com/technet/archive/itsolutions/intranet/downloads/webstres.mspx ). For example, System Monitor (formerly called Performance Monitor) is built in to Windows 2000 and is essential to monitoring nearly every aspect of server performance.

This section briefly describes the following tools:

Microsoft Web Application Stress (WAS) tool
Network Monitor
SQL Profiler
System Monitor
Visual Studio Analyzer

In addition to these tools, you might also consider using the tools listed in the following table.

Tool	Description
Process and Thread Status (Pstat.exe)	Shows the status of all running processes and threads. Pstat.exe is available on the Windows 2000 Server Resource Kit companion CD.
Process Tree (Ptree.exe)	Queries the process inheritance tree and shuts down processes on local or remote computers. Ptree.exe is available on the Windows 2000 Server Resource Kit companion CD.
HTTP Monitoring	Monitors HTTP activity on your servers and can notify you if there are changes in the amount of activity. HTTP Monitoring is available on the Windows 2000 Resource Kit companion CD.
NBTStat	Detects information about your server's current network connections. For more information about NBTStat, see https://www.microsoft.com/technet/archive/itsolutions/intranet/downloads/webstres.mspx .

Microsoft Web Application Stress Tool

The Microsoft Web Application Stress (WAS) tool is a simulation tool developed by Web testers to realistically reproduce multiple browsers requesting pages from a Web application. Microsoft has made the tool easy to use by masking some of the complexities of Web server testing. This makes the WAS tool useful for anyone interested in gathering performance data on a Web site.

The WAS tool is a consolidation of many of the best features developed over the years, as well as a few new features. In addition, this version covers the most needed features for stress testing three-tiered, personalized, ASP page sites running on Windows 2000. To download WAS, see https://www.microsoft.com/technet/archive/itsolutions/intranet/downloads/webstres.mspx .

Microsoft Network Monitor

Network Monitor (Netmon.exe) is a Windows 2000 administrative tool you can use to monitor network traffic. It is not installed by default, but you can install it by using the Add/Remove Programs option in the Control Panel.

Network Monitor captures network traffic for display and analysis. You can use it to perform tasks such as analyzing previously captured data in user-defined methods, extracting data from defined protocol parsers, and analyzing real-time traffic on your network.

Network Monitor is useful for capturing packets between browsers, Web servers, and SQL Server. It provides valuable timing information as well as packet size, network utilization, and many other statistics that can be valuable for managing system performance. For more information about Network Monitor, see Microsoft Windows 2000 Help.

SQL Profiler

SQL Profiler is a tool that captures Microsoft SQL Server events from a server. The events are saved in a trace file that you can analyze later or use to replay a specific series of steps when you are trying to diagnose a problem. You can use SQL Profiler to:

Step through problem queries to find the cause of the problem.
Find and diagnose slow-running queries.
Capture the series of SQL statements that lead to a problem. The saved trace can then be used to replicate the problem on a test server where the problem can be diagnosed.
Monitor the performance of SQL Server to tune database performance.

For more information about SQL Profiler, see https://www.microsoft.com/sql/techinfo/tips/administration/parastatements.asp .

System Monitor

With System Monitor, you can collect and view extensive data about the ways in which hardware resources are used and the activities of system services on your site. You can use System Monitor to:

Collect and view real-time performance data on a local computer or from several remote computers.
View data in a counter log that is either being collected currently or was collected previously.
Present data in a printable graph, histogram, or report view.
Incorporate System Monitor functionality into Microsoft Word or other applications in the Microsoft Office suite by means of automation.
Create HTML pages from performance views.
Create reusable monitoring configurations that can be installed on other computers using the Microsoft Management Console (MMC).

For more information about System Monitor, see https://support.microsoft.com/default.aspx?scid=KB;en-us;248345&sd=tech .

Visual Studio Analyzer

You can use Visual Studio Analyzer, a tool in Microsoft Visual Studio 6.0 Enterprise Edition, to analyze performance, isolate page faults, and understand the structure of your distributed applications. You can use Visual Studio Analyzer with applications and systems built with any of the Visual Studio tools.

Several Microsoft technologies, such as COM, ADO, and COM+, are shipped with the ability to provide information to Visual Studio Analyzer. If your applications use any of these technologies, you can get detailed information about this use in Visual Studio Analyzer. In addition, you can customize your own applications to provide information to Visual Studio Analyzer.

For more information about Video Studio Analyzer, see https://msdn.microsoft.com/library/en-us/vsavs70/html/veoriVisualStudioAnalyzerInBetaPreview.asp .