Microsoft Commerce Server 2000: Site Management

During the Management phase, you continue to monitor, test, and resolve problems in the hardware, software, and content of your site. You analyze the data you collect by monitoring site activity, and then use that data to improve site performance from both a technological and a marketing perspective. Finally, you create and perform operational procedures such as backup, recovery, and log capture, for administering the day-to-day operation of your site. Figure 17.1 shows a high-level view of the management process.

Cc936695.f17csrk01(en-US,CS.10).gif 

Figure 17.1 High-level view of the site management process 

This chapter describes how to:

  • Perform a site checkup 

  • Monitor and analyze log data 

  • Set up and perform the operational procedures necessary to manage your site 

In addition, the Management section of this book contains the chapters listed in the following table.

Chapter

Title

Description

18

Problem Management

Best practices for managing problems and troubleshooting your Microsoft Commerce Server 2000 site

19

Maximizing Performance

Methods for creating site usage profiles, and analyzing and managing site performance

Performing a Site Checkup

Cc936695.spacer(en-US,CS.10).gifCc936695.spacer(en-US,CS.10).gif

E-commerce sites are dynamic and the requirements for a successful site can change dramatically over time. For example, your product line might change, site visitor usage profiles change, and from time to time you need to introduce new software, new catalogs, and other new content. Over time, the impact of the changes can affect the stability of your site.

It is a good idea to periodically conduct a site "checkup" to make sure that everything is working properly. A good time to conduct the checkup is prior to gearing up for the holiday shopping season, to give you time to correct any problems that might have crept into your site during the previous year, and to be sure that your customers have the best possible shopping experience.

Your site checkup should be a collaborative process, involving your system administrators, development staff, and business management. If other companies are developing or managing your site, you should include them, as well. The questions in the following table can provide guidelines for the types of questions you need to ask during the checkup. You might also have additional questions specific to your site.

Category

Questions

Comments

Backup and maintenance

· What procedures do we have for rebuilding services?
· What are our database maintenance procedures?
· When was the last time we rebuilt the site from scratch?
· Do we have off-site storage for critical databases?

Regular backup and maintenance procedures ensure that you can identify and access all parts of your site, if necessary. Reconstructing a site is an effective way to make sure all the parts are available.

Event logs

· What warnings and errors have been occurring in our system and application event logs?
· How do current warnings and errors compare to those reported in previous periods?

Event logs provide a useful gauge for the health of your system.

Load

· What is our current site load for the following:
· Transactions per day
· Peak concurrent shoppers
· CPU utilization on Web servers
· How has our site load changed over the past year?
· What have we done to accommodate load changes?
· What incremental load do we anticipate during the holiday season and during the coming year?
· What changes do we have to make to accommodate increases in load in the following areas:
· Hardware
· Network
· Monitoring tools

Carefully managing site load is an effective means of improving site stability. Many sites experience large increases in traffic during the holiday season or as a result of advertising campaigns. Simulating the projected load is an effective way to make sure your site can handle the increased load.

Security

· When was our most recent security audit?
· Have we applied the most recent security patches?

Security requires constant vigilance.

Software changes

· When was the last version or service pack applied?
· When was the last platform upgrade?
· Are we planning any platform or application changes? If so:
· What validation procedures are in place?
· What are our procedures for backing out changes?
· Have we reviewed our application code against the latest software best practices?
· When was our last software audit of the production servers?

Software upgrades, no matter how simple, introduce change and threaten system stability. Over time, software can drift from the specified configuration, making it difficult to identify problems and impossible to rebuild a server for debugging purposes.

Software problems

· How many software problems have been reported since the last version or service pack was installed?
· How has the "find rate" for problems changed over the past year?

Problem counts are an effective way to track site quality. Although there will always be problems, a stable site should show a decreasing problem find rate.

Stability and availability

· How stable is our site, on a scale of 1 to 5?
· How often do we reboot our Web servers, and why? Has the frequency increased within the past three months?
· How are we eliminating single points of failure in the following:
· Web servers
· Database servers
· Power supplies
· Network access
· What is our disaster recovery plan?

Eliminating single points of failure can dramatically increase site availability. For suggestions for eliminating single points of failure, see Chapter 6, "Planning for Reliability and High Availability."

Support

· Is our staffing and problem escalation planning adequate for our availability requirements?
· Do we have current support contracts and contacts for any third-party software that we are using?
· Do we have a separate test environment?

The ability to efficiently report and escalate problems to knowledgeable sources is important for ensuring a healthy site.

Monitoring and Analyzing Log Data

Cc936695.spacer(en-US,CS.10).gifCc936695.spacer(en-US,CS.10).gif

For very serious errors and events, you should install an automated alarm system that continuously monitors your log files and sends notification (for example, an e-mail or pager message) when a particular type of error or event occurs. Your contingency plans should include notification and action scenarios. For more information about developing a contingency plan, see Chapter 14, "Deploying Your Site."

An alarm system scans for predefined errors and events by continuously monitoring the data written to any log files that you specify. In addition to errors and events, alarm systems can check for highs and lows in performance counters. You can configure an alarm system by setting priorities and responses for error, event, and failure information.

Your alarm system should respond to:

  • Backups that fail 

  • System resources that become dangerously low

  • Services that stop unexpectedly 

  • Events or system states that can affect site functionality 

In addition to sending e-mail and pager messages, your alarm system might log the event or notification in a special file, run a script or program to correct the problem (for example, restarting a service), or log ancillary data to help you troubleshoot the error. For more information about system monitoring, see Chapter 19, "Maximizing Performance."

Analyzing Log Data

The log files created by your software applications record site usage, operational events, and performance data, as well as errors and warnings. Log files store the history of events within a system, and are often the only way to detect and trace an intrusion by a hacker. You can use the data in log files to diagnose server problems, to track the number of users who visit your site so that you can plan for expansion, and to know which pages of your site are the most popular. You should capture and analyze log data on a regular basis to evaluate the health of your system.

Logs can contain vast amounts of data, so it is important to identify which information is valuable and configure the log files to record only that information. Some applications create a new log file at the beginning of each day. Other applications begin to delete older data or start a new log file when the logs reach a specified size.

It is also important to keep track of which log files contain what data, and where each log file is located, to facilitate analysis. Configure your system to maintain the log files that provide the information you need. Be sure to consider the size of the log files, and archive or delete them often enough to prevent the files from becoming too large. You should design a methodology for analyzing log file data that includes the following:

  • A list of the log files to be analyzed 

  • The frequency of analysis 

  • The data to be analyzed 

  • A distribution list for the resulting reports 

  • A schedule for archiving report data and removing it from your site databases 

Start by identifying which applications are necessary to site operation, and then gather the available log files from those applications and analyze their contents. Answers to the following questions will help you design your analysis methodology:

  • What information do we want to analyze? 

  • Which log files contain that information? 

  • How large are the log files we want to analyze? 

  • How many log files does each server have? 

  • How many servers do we have? 

  • Where are the log files stored on each server? 

  • What reporting application can produce log file storage information? 

  • What input does the reporting application require to produce that information? 

  • What application should we use to transform the raw log file data into input for the reporting application? 

  • How often should we capture and analyze each log file? 

You must decide whether you want to manage access to the data by having a central team create requested reports, or by having each interested team access the data and create their own reports. You must also decide how often to capture the log files, based on the nature and the volume of the data they contain.

For example, some log files contain data such as error messages and event notifications that is critical to preventing system failures. Analyzing this data after the system fails can provide clues to the reason for the failure. Access to this data at the time of the event or error might even help you prevent the system from failing.

You can use Commerce Server to import the contents of the Web log files into the Commerce Server Data Warehouse. Then you can use the Analysis modules in Commerce Server Business Desk to run reports. For more information, see "Business Desk Analysis" in Commerce Server 2000 Help.

In addition to your server log files, Commerce Server creates the following log files:

  • Pup.log (created by Commerce Server Site Packager)

  • Setup.log (created by Commerce Server Setup)

  • Debug.log (records all actions for the Profile Designer module in Business Desk) 

  • Basket.log, Total.log, and Checkout.log (created each time a pipeline is used) 

Important You should use the pipeline log files (Basket.log, Total.log, and Checkout.log) only for debugging purposes. You should not use them in your production environment because they can significantly slow pipeline execution. In addition, they might log and expose sensitive information, such as credit card numbers. Because they are intended only for debugging purposes, they are not thread-safe.

E-mail messages sent out by Commerce Server Direct Mailer are logged in a file in the folder you specify in the Direct Mail Properties dialog box. Direct Mailer creates a new log file every day to log direct mail activities (service starts and stops, jobs processed, and so on). The default location for the Direct Mailer log files is c:\winnt\system32\logfiles\, but you can change that location, if necessary, using Commerce Server Manager.

You can use the advanced features of the Web log file import process to modify log file data to provide the information you want to analyze. To do this, you can set the following properties for the imported data:

  • Default files. Identify different versions of the path into your Web site so that the hit counts for unique visitors to your site are accurate. For example, visitors entering your site using http://www.contoso.tld/ and http://www.contoso.tld/index.htm, should be counted as hits on the same page. 

  • Excludes. Prevent the following data from being imported into the Data Warehouse: hits from specific hosts, requests for specific file types or expressions, and hits by crawlers. For example, exclude hits on your Web site by users within your corporation from being counted. 

  • Inferences. Customize the assumptions made during import about users and visits. 

  • Log Files. Customize the response to time overlaps in log files. 

  • Query strings. Import Web site query strings, so that you can analyze the data associated with them.

For more information about setting these properties, see "Running the Data Warehouse" in Commerce Server 2000 Help.

Reports

After you import the Web log files into the Data Warehouse, you use the Analysis reports from Business Desk to analyze the data. When you design your analysis strategy, you need to know who will use the data and how they will use it. The following table lists the different ways in which team members might use log file data.

Group

Statistics analyzed

Purpose

Application developers

Application errors and warnings

· Monitor the health of the application
· Identify and anticipate system failures

Marketing

Usage statistics

· Identify customer demographics
· Determine how the site is used
· Identify popular pages or content

Site architects

· Usage statistics
· Performance statistics

Plan for expansion, better performance, and increased availability

Web designers

Usage statistics

Improve the user interface (UI) and site functionality

Commerce Server provides a variety of reports that show log file and Web site data in useful formats. For information about the reports shipped with Commerce Server, see "Business Desk Analysis" in the "Working with Business Desk" section in Commerce Server 2000 Help. For information about creating custom Commerce Server reports, see "Creating Custom Reports" in the "Extending Commerce Server" section in Commerce Server 2000 Help.

In general, you should create and use two types of reports:

  • Analytical reports. To analyze the performance of your site and evaluate the success of marketing and content. 

  • Error/warning reports. To identify errors and failures occurring in the system. 

When you create your reports strategy, you should use a tool to convert the data from your log files into useful information. The Data Warehouse performs the conversion as part of the log file import process. The process of converting the data is often referred to as performing aggregations and summations. This process changes raw data collected in the log files into useful information and provides some interpretation of the results.

For example, a Web server log file containing 1,000 separate hits for a page is imported into the Data Warehouse. In the Data Warehouse, the hits are totaled and the results are reported so that you know the page had a total of 1,000 hits.

Setting Up and Performing Operational Procedures

Cc936695.spacer(en-US,CS.10).gifCc936695.spacer(en-US,CS.10).gif

Site management includes the following:

  • Continuous monitoring and routine administrative maintenance, including log file analysis and site backup 

  • Periodic administration, including upgrading hardware and software to improve performance, log archiving, and planning for expansion 

  • Contingency planning and management, including preparing for calamities such as power outages, earthquakes, fire damage, security breaches, hardware and software failures, and the loss of key personnel 

If your goal is to have your Commerce Server site continuously available, you must monitor your site constantly for system failure and for events that necessitate immediate intervention. For more information about setting up operational procedures, see the chapters listed in the following table.

Chapter

Title

Description

5

Planning for Scalability

Various ways of scaling your site to increase site capacity

6

Planning for Reliability and High Availability

Various techniques for protecting your site from outages

14

Deploying Your Site

How to do contingency planning for your site

19

Maximizing Performance

How to monitor the performance of your system

Creating a Site Administration Plan

You should set up an administration plan in which you assign monitoring, analysis, and administrative responsibilities. For example, you might decide to form a technology team to respond to system errors and to improve site performance. Or, you might decide to form a marketing team to respond to customers and analyze site usage to improve the commercial success of your site.

Your site administration plan is based on your site development, testing, and contingency planning efforts, and should contain the sections listed in the following table.

Title

Contains procedures for

Site Administration

· Performing routine site maintenance and backups
· Capturing and analyzing log files
· Site monitoring and event notification
· Data archiving and database management

Problem Management

Mitigating system hardware and software problems. For information about creating a problem management plan, see Chapter 18, "Problem Management."

System Monitoring

Monitoring the health of your system to alert you to system failures. For information about system monitoring, see "Monitoring System Health" in Chapter 18.

Site Documentation

Maintaining complete documentation for your site hardware, software, and content. For information about site documentation, see "Documenting Your Site" in Chapter 19, "Maximizing Performance."

Traffic Analysis

Analyzing site traffic, to get key information such as the number of users visiting your site concurrently. For information about analyzing site traffic, see "Analyzing Traffic" in Chapter 19.

Performance Measuring

Measuring site performance, to identify bottlenecks that indicate the need to increase the capacity of the software or hardware running your site. For information about measuring site performance, see "Measuring Performance" in Chapter 19.

Your site administration plan should answer the following questions:

  • How should we respond when we receive an alert?

  • Who is responsible for performing site backups and where should we store the media?

  • What should we do if hardware fails?

  • How can we ensure that site upgrades do not interrupt functionality?

  • Are we hosting our site internally? If not, what does our hosting provider take care of?

  • Which teams are responsible for what tasks?

  • Where are the handover points between teams?

  • How are we tracking changes?

  • Who is updating site documentation and what should be documented? 

  • What is our procedure for documenting events and subsequent actions? 

  • What tools should we use to monitor, notify, track, and report events?

  • What site data should we plan to analyze? How should we use the data? 

  • Do we have a plan for manual backup in case of system problems? (For example, if a server malfunctions while a customer is placing an order, do we provide a telephone number on the Web page so that the customer can call and speak to customer service to complete the order?) 

You also need to consider whether to administer your site locally or remotely. If you are using a hosting provider, it is especially important to understand what administrative services the hosting provider will perform.

Your administration plan should also include growth and upgrade scenarios, maintenance schedules and procedures, and a schedule of daily, weekly, monthly, and as-needed activities for administering your site.

Creating and Performing Operational Procedures

You should create a schedule of daily, weekly, monthly, and as-needed activities for operating your site.

Daily activities might include the following:

  • Check logs (server event logs, router logs, and firewall logs) and fix problems, as necessary 

  • Maintain accounts, directories, shares, and security groups 

  • Monitor Web traffic for indications of attacks and plug security holes 

  • Perform and verify backups 

  • Visually inspect indicator lights on servers and hubs 

  • Check available space on all servers 

  • Verify that all services on all servers are running 

  • Ensure that anti-virus software is up-to-date 

  • Monitor replication 

  • Monitor performance 

  • Monitor network traffic 

  • Check print queues 

  • Keep a maintenance log 

  • Monitor the load on the database server 

Weekly activities might include the following:

  • Clean servers 

  • Produce reports on the week's activity 

  • Update software, as necessary 

  • Audit the network for unauthorized changes 

Monthly activities might include the following:

  • Rebuild databases, if needed 

  • Produce reports on activity for the month 

  • Change passwords 

  • Manage off-site storage of backup media 

  • Perform a system vulnerability analysis 

Initial or as-needed activities might include the following:

  • Practice recovering from disaster 

  • Document the full network 

  • Rebuild corrupt servers 

  • Test the recovery procedure 

  • Get a performance baseline 

The actions listed are just a starting point. There are many more actions you can add to these lists to ensure that your site operates at an optimal level.

Managing Security

Managing security includes activities designed to maintain, improve, and restore (when necessary) the security of your site. Security plays a critical role in the success of an online e-commerce site. You must be able to protect the interests and confidential information of both your business and your customers.

As part of managing security, you should:

  • Monitor your site for security breaches and holes. 

  • Maintain the most current anti-virus protection. 

  • Constantly research industry security issues, product reviews, and threats.

In addition to monitoring for external threats, you must also guard against internal threats by controlling and monitoring the number of individuals inside your organization who have administrative permissions to your Web site servers.

Managing Changes

You must create procedures for implementing requested features and changes on your site. You should always implement changes in a test environment and thoroughly test any changes you make before moving them to your production site. You should also update your site documentation with information about any changes you make. For more information about setting up a process for managing changes, see "Managing Change" in Chapter 8, "Developing Your Site."

Backing Up and Restoring Site Data

Your requirements for the availability of site data determine the content of your site backups. Performing a daily backup of your site is critical. For maximum security, store backups offsite in a secure fire- and water-proof environment. Your backup strategy should specify the following:

  • Type and frequency of backups 

  • Hardware and software to use to perform backups 

  • Type of media to use for backups 

  • The frequency with which you should recycle the media 

  • Secure location (onsite and offsite) in which to store the backups 

  • A method for managing the security of the backup location 

To ensure reliable recovery of your site in case of disaster, you must thoroughly test your backup and recovery procedures. Test different failure scenarios to be sure that you can recover quickly from different types and severities of failures.

To develop a successful backup and recovery plan, you must identify the data that is critical to your business and know the frequency with which it changes. Many of your decisions should be driven by data availability, the financial cost of your site being inaccessible, whether or not you can recreate lost data, the size and type of the data to be backed up, and the complexity of your site.

You also need to determine whether to perform full site backups or to back up site components individually. Assuming that you can re-create the architecture of your site, you should back up the following:

  • Commerce Server databases, including the Administration database and the Data Warehouse

  • All content, including Active Server Pages (ASP), dynamic-link library (DLL), Graphics Interchange Format (GIF), and Hypertext Markup Language (HTML) files 

  • Web site log files, especially if you are actively analyzing site traffic data 

You should also back up metadata and registry information, and other site architecture and implementation information. The following practices can help you reduce the time it takes to recover your data after a disaster:

  • Use archiving to reduce the size of your Commerce Server databases. Archiving enables you to retain historical data, yet clear space in your site databases.

  • Use multiple backup devices simultaneously. 

  • Use a combination of full-database, differential-database, and transaction-log backups to minimize the number of backups that must be applied at the point of failure. 

  • Use file and file-group backups and transaction log backups. Back up only those files that contain relevant data. 

  • Use snapshot backups to minimize or eliminate the use of server resources in the backup process. (Snapshot backups require third-party hardware and software.) 

For more information about these practices, see "Backing Up and Restoring Databases" and "Archiving and Restoring Databases" in SQL Server Books Online.

For more information about tools and techniques for backing up and restoring Commerce Server, see "Backing Up and Restoring Commerce Server" and "Backing Up and Restoring a SQL Server Database" in Commerce Server 2000 Help. Also, see "Backing Up Your Site" in Chapter 14, "Deploying Your Site."

Cc936695.spacer(en-US,CS.10).gif