Site Server - Update Incorporating SQL Server 7.0 and Xeon Architecture

Update Incorporating SQL Server 7.0 and Xeon Architecture 

August 1999

Introduction

This document is an update to the Microsoft Site Server 3.0 Commerce Edition Performance and Capacity Analysis white paper that is included with the Microsoft® Site Server 3.0 Commerce Edition Resource Kit. The purpose of this update is to address changes in both software and hardware configurations that could impact performance calculations. These changes reflect evolutionary improvements in technology that naturally occur over time in the computer industry, such as faster processors, new versions of software and the latest releases of service packs. A comparison between the two system configurations is shown below.

 

Old Configuration

New Configuration

System Software:

Microsoft® Windows NT® Server 4.0, Service Pack 3

Windows NT Server 4.0, Service Pack 4

Database Software:

Microsoft® SQL Server™ version 6.5, Service Pack 4

SQL Server version 7.0, Service Pack 1

CPU (SSCE):

4 x 200-MHz Pentium Pro w/512 K L2 cache

4 x 400-MHz Pentium II (Xeon) w/512 K L2 cache

CPU (SQL Server):

4 x 200-MHz Pentium Pro w/512 K L2 cache

4 x 400-MHz Pentium II (Xeon) w/512 K L2 cache

Disk (SSCE):

2 x 4.3-GB SCSI-3 (10,000 RPM)

2 x 4.3-GB SCSI LVD (10,000 RPM)

Disk (SQL Server):

1 x 4.3-GB SCSI-3 (10,000 RPM)

2 x 4.3-GB SCSI LVD (10,000 RPM) 20 x 9.1-GB SCSI LVD (10,000 RPM)

Memory (SSCE):

384-MB ECC buffered EDO RAM

2-GB ECC buffered EDO RAM

Memory (SQL Server):

256-MB ECC buffered EDO RAM

2-GB ECC buffered EDO RAM

Network:

100-BaseT Switched Ethernet

100-BaseT Switched Ethernet

This document compares the capacity of these two configurations by analyzing CPU and disk costs, using a revised and simplified approach to Transaction Cost Analysis (TCA) methodology. You may be tempted to skip to the Summary of Capacity and Performance section later in this document and use the numbers found there as a guideline for designing a system configuration based on your unique capacity requirements. However, more ambitious site builders will appreciate the value in studying the TCA methodology to run their own tests and do their own performance and capacity planning analysis.1 

Capacity planning for a service such as SSCE has many dependencies. These include hardware, system software, database software, ASP scripts, and usage profiles. The importance of understanding how different system configurations, different SSCE sites, and different usage characteristics will produce different results cannot be overemphasized.

Active Server Pages (ASP) scripts used to build an SSCE site are unique. ASP performance can vary widely, depending on the complexity and efficiency of the code. These variations will ultimately impact resource cost and capacity. A well-written ASP can have a greater impact on capacity and performance than an evolutionary change in hardware.

User behavior will also vary from site to site, and this variation needs to be reflected in the Shopper Profile discussed later in this document. For example, one site may show that 20 percent of store visitors actually purchase a product while another site shows only 2 percent. Because the purchase operation is the most demanding operation that can be performed on an SSCE site (from the standpoint of computer resource usage), this change in the shopper profile will have a significant impact on shopper capacity.

In summary, the results of the analysis provided in this document will prove useful for those wanting to expedite the capacity planning process. However, in order to create a capacity plan with the highest level of confidence, it is recommended that this document be used as a road map for a "hands on" approach to the performance and capacity analysis of a uniquely-designed SSCE site.

Summary of Capacity and Performance

Performance and capacity for an SSCE site is by and large a factor of how efficiently ASP pages can be processed. Because ASP processing is highly CPU intensive, SSCE capacity is reached when ASP processing maximizes CPU resources.

In light of this, it should be noted that multi-processor SSCE servers do not make efficient use of CPU resources, due primarily to thread management on 2-processor and 4-processor computers.2 

Having said this, the most straight-forward approach to increasing SSCE capacity is to increase CPU resources, either by using faster processors or by adding more SSCE servers. See Appendix B in the Microsoft Site Server 3.0 Commerce Edition Performance and Capacity Analysis for a detailed comparison of 1-processor, 2-processor and 4-processor SSCE server test results. However, for purposes of comparison, test results for 4-processor SSCE system configurations are used.

Based on the data derived using the new hardware and software and presented in this document, the following assertions can be made about the performance of the Pentium Pro and Xeon configurations when hosting the Volcano Coffee sample site found in SSCE 3.0g:3 

  • Switching from Pentium Pro to Xeon configurations has increased shopper capacity by approximately 140 percent (from 400 to 950 shoppers). 

  • The increase in shopper capacity for the Xeon configuration is by and large a result of the increase in available CPU cycles (800 MHz for the Pentium Pro and 1600 MHz for the Xeon). 

  • The Xeon configuration shows some increased efficiency in terms of CPU cost per shopper. Cost declines from 0.91 Mcycles/sec4 on the Pentium Pro to 0.74 Mcycles/sec on the Xeon, a change of 19 percent. 

  • The Xeon configuration shows significant improvements in terms of disk cost. One shopper session will generate 0.016 disk seeks on a Pentium Pro configuration and 0.008 disk seeks on a Xeon configuration. However, the improvement in disk performance in the Xeon configuration is not a factor in increased shopper capacity. 

Capacity Comparisons

The following chart shows how the increased CPU power of the Xeon configuration impacts shopper capacity. Because CPU is the bottleneck with SSCE,5 shopper capacity can be easily increased by adding more CPU power, in the form of faster, more powerful processors. When CPU is at maximum available usage,6 the Pentium Pro configuration supports 400 shoppers and the Xeon configuration supports 950 shoppers.

Chart 1 Comparing Shopper Capacity for Pentium Pro and Xeon Configurations

Chart 2 compares resource usage for CPU and disk on a Xeon configuration. It shows that when CPU usage is maximized, disk is operating at approximately 2.75 percent.7 The implication here is that CPU resources will be maximized far sooner than disk resources.

Chart 2 Comparing Shopper Capacity for Xeon CPU and Xeon Disk

Capacity and Performance Detail

Shopper Profile

This Shopper Profile is also found in the Microsoft Site Server 3.0 Commerce Edition Performance and Capacity Analysis white paper, which is part of the Site Server 3.0 Commerce Edition Resource Kit. Shopper operations shown here are based on the shopper operations included in the Volcano Coffee (VC) sample site, which is part of SSCE 3.0.

This profile identifies the behavior of an average shopper, with 19 shopper operations performed during a 20-minute session. The Profile Performance Rate (PPR) is calculated from each of the values used for transactions/session, by converting transactions/session into transactions/second. For example, this profile shows that 19.0 operations will be performed by the average shopper during a 20-minute session (or 1200 seconds). Thus the PPR for all of the operations is 19.0 / 1200, or 0.015827 transactions per second.

Table 1 Shopper Profile Used in this Report 

VC Shopper Operation

Transactions/ session

Profile Performance Rate

Additem

1.5

0.001250 trans/sec

Basket

2.0

0.001667 trans/sec

Checkout

0.5

0.000417 trans/sec

Clearitems

0.5

0.000417 trans/sec

Default

1.0

0.000833 trans/sec

Delitem

0.5

0.000417 trans/sec

Listing

0.5

0.000417 trans/sec

Lookup

1.0

0.000833 trans/sec

Main

2.5

0.002084 trans/sec

Product

6.0

0.005000 trans/sec

Search

2.0

0.001667 trans/sec

Welcome

1.0

0.000833 trans/sec

Total for All Operations

19.0

0.015827 trans/sec

ASP Performance Comparisons

In the following table, the optimum performance rate8 for each shopper operation is compared for Pentium Pro and Xeon configurations. These performance rates are also compared in Chart 3 (taller bars indicate greater performance).

Table 2 Comparing ASP Performance (ASP Requests/sec) for Pentium Pro and Xeon Configurations 

VC Shopper Operation

ASP requests/sec (Pentium Pro)

ASP requests/sec (Xeon)

% Improvement (from Pentium Pro to Xeon)

Additem

3.897

5.777

48.24%

Basket

7.886

8.252

4.64%

Checkout

3.754

8.521

126.97%

Clearitems

7.169

9.373

30.75%

Default

28.895

44.850

55.22%

Delitem

4.570

7.168

56.85%

Listing

3.272

7.074

116.20%

Lookup

10.988

28.848

162.54%

New

9.722

29.591

204.37%

Main

4.414

14.670

232.35%

Product

3.305

8.737

164.36%

Search

7.555

14.051

85.98%

Welcome

32.523

104.262

220.58%

Chart 3 Comparing ASP Performance (ASP Requests/sec) for Pentium Pro and Xeon Configurations 

Processor and Disk Costs

Processor and disk costs are calculated from CPU utilization and ASP requests per second. Cost is a calculation that represents the total number of CPU cycles and/or disk seeks required to perform a single transaction. These costs are compared for Pentium Pro and Xeon configurations.9 

Table 3 Comparing CPU and Disk Cost for Pentium Pro and Xeon Configurations 

VC Shopper Operation

CPU CostPPro10

CPU CostXeon

Disk CostPPro11

Disk CostXeon

Additem

53.350

73.541

1.971

0.821

Basket

32.091

52.696

0.407

1.146

Checkout

176.195

125.193

23.783

6.149

Clearitems

29.552

36.658

1.070

1.475

Default

5.622

5.665

0.008

0.000

Delitem

47.464

50.609

0.713

1.378

Listing

64.518

30.093

0.000

0.000

Lookup

17.182

10.295

0.015

0.000

New

20.935

12.269

0.796

0.526

Main

66.300

43.168

0.037

0.054

Product

81.048

55.050

0.047

0.126

Search

32.119

43.886

0.012

0.000

Welcome

6.270

4.544

0.003

0.000

The following chart uses the data from Table 3 to compare cost for each shopper operation for the Pentium Pro and Xeon configurations. Shorter bars indicate more efficient use of CPU resources (lower cost of operation).

Chart 4 CPU Cost by ASP for Xeon and Pentium Pro Configurations 

Processor and Disk Calculations

CPU and disk costs found in Table 3 are multiplied by the Profile Performance Rate (PPR) found in the Shopper Profile (Table 1) to create weighted CPU and disk costs for each operation as shown in the table below (Table 4). For example, the CPU cost (Pentium Pro configuration) for the Additem operation is shown in Table 3 to be 53.350 Mcycles. The Shopper Profile indicates that the average shopper will generate 0.001250 Additem operations per second. This creates a weighted cost of 53.350 * 0.001250, or 0.0667 Mcycles per second.

The sum of the weighted costs provides a CPU and a disk cost per shopper per second, which can be used in simple formulas to predict capacity (maximum number of shoppers per second).

Table 4 Comparing Cost per Shopper (K) for Pentium Pro and Xeon Configurations 

VC Shopper Operation

Weighted CPU CostPpro 12

Weighted CPU CostXeon

Weighted Disk CostPpro 13

Weighted Disk CostXeon

Additem

0.0667

0.0919

0.002464

0.001026

Basket

0.0535

0.0878

0.000678

0.001910

Checkout

0.0735

0.0522

0.009918

0.002564

Clearitems

0.0123

0.0153

0.000446

0.000615

Default

0.0047

0.0047

0.000007

0.000000

Delitem

0.0198

0.0211

0.000297

0.000575

Listing

0.0269

0.0125

0.000000

0.000000

Lookup

0.0143

0.0086

0.000012

0.000000

New

0.0436

0.0256

0.001659

0.001096

Main

0.3315

0.2158

0.000185

0.000270

Product

0.1351

0.0918

0.000078

0.000210

Search

0.0268

0.0366

0.000010

0.000000

Welcome

0.0992

0.0719

0.000047

0.000000

Cost per Shopper per Second (K)

0.9079

0.7358

0.015802

0.008267

Processor and Disk Equations

In the previous table (Table 4), the cost per shopper per second (K) is calculated for Xeon and Pentium Pro configurations. These values can be plugged into equations to calculate shopper capacity using the following formula:

Capacity (C) = Number of Shoppers (N) * Cost per Shopper per Second (K)

Below are the capacity equations for processor and disk created from the calculations for cost per shopper per second (K) that are shown on the bottom line of Table 4. Each equation is bound by a maximum value, which is equivalent to the maximum number of Mcycles and disk seeks available for each system.

For Xeon:

CCPU = Min [ (N * 0.7340), 640 ]CDSK = Min [ (N * 0.0089, 280 ]

For Pentium Pro:

CCPU = Min [ (N * 0.8632, 320 ]CDSK = Min [ (N * 0.0130, 280 ]

The following chart (Chart 5) shows the CPU usage for Pentium Pro and Xeon configurations based on the previously constructed CPU equations. The Pentium Pro reaches CPU capacity (320 Mcycles) when shopper load is 400. The Xeon's CPU capacity (640 Mcycles) is reached when the shopper load is 950. Increased shopper capacity for the Xeon configuration is a factor of lower CPU cost per shopper as well as higher maximum number of Mcycles available.

Chart 5 Projected CPU Costs Based on Shopper Load for Pentium Pro and Xeon 

In the following chart (Chart 6), disk cost is compared for Xeon and Pentium Pro configurations. The upper limit for each configuration is based on the values for shopper capacity calculated for CPU (400 shoppers for the Pentium Pro and 950 shoppers for the Xeon). In both configurations, the disk is operating well below the maximum disk capacity of 280 seeks/sec.

Chart 6 Projected Disk Costs Based on Shopper Load for Pentium Pro and Xeon 

1 For information about using TCA with SSCE, refer to the Microsoft Site Server 3.0 Commerce Edition Performance and Capacity Analysis white paper included with the Site Server 3.0 Commerce Edition Resource Kit.

2 By default, Microsoft Internet Information Server (IIS) allocates a maximum of 10 threads per processor. If thread contention is an issue, increasing the number of threads in the IIS thread pool can improve ASP performance. However, more threads in the IIS thread pool result in more context switching, which causes additional CPU overhead. So the optimum relationship between thread pool size and context switching must be calibrated carefully. Ideally, thread pool size should not push CPU utilization beyond 70 percent.

3 Note that these results also require the use of the proposed Shopper Profile found in the "Capacity and Performance Detail" section of this document. Different shopper profiles can produce considerably different results.

4 The Mcycle is the unit of processor work used in this document. One Mcycle is equal to one million CPU cycles. As a unit of measure, the Mcycle is useful for comparing performance between processors because it is hardware independent.

5 For further discussion of the CPU bottleneck in SSCE, refer to the Microsoft Site Server 3.0 Commerce Edition Performance and Capacity Analysis white paper.

6 Maximum available CPU usage for a 4-processor SSCE server has been determined to be 40 percent, which is equivalent to 320 MHz for the Pentium Pro (800 MHz x 40%) and 640 MHz for the Xeon (1600 MHz x 40%).

7 Disk capacity is defined in terms of the maximum number of disk seeks that can be performed on the SQL Server data partition. For both Xeon and Pentium Pro configurations, 100 percent disk utilization is equal to 280 seeks per second. When CPU utilization is running at maximum, the SQL Server data disk is performing, on average, 7.7 disk seeks per second, which is well below capacity.

8 Optimum performance rate is approximately equal to maximum ASP throughput, although it is typically lower. For purposes of Transaction Cost Analysis, each shopper operation is tested to determine optimum performance rate (in terms of ASP requests per second) for a specific system configuration. As load is increased, throughput increases and resource cost remains fairly constant. However, when a certain threshold is reached, throughput continues to increase somewhat, but resource cost increases geometrically. The optimum performance rate is determined to be the point at which ASP throughput is greatest prior to the geometric increase in cost.

9 Lower cost per transaction translates to more transactions for a given number of CPU cycles, which (with all other things being equal) translates to greater capacity. However, because the Xeon configuration provides twice as many CPU cycles for processing transactions, capacity for the Pentium Pro and Xeon configurations will ultimately be determined not only by cost per transaction but by maximum cycles available for processing transactions.

10 CPU cost is measured in terms of Mcycles required to perform a single transaction, where one Mcycle is equal to one million CPU cycles.

11 Disk cost is measured in terms of disk seeks required to perform a single transaction.

12 Weighted CPU cost is measured in terms of Mcycles/sec.

13 Weighted disk cost is measured in terms of disk seeks/sec.