Microsoft Commerce Server 2000: Site Management
During the Management phase, you continue to monitor, test, and resolve problems in the hardware, software, and content of your site. You analyze the data you collect by monitoring site activity, and then use that data to improve site performance from both a technological and a marketing perspective. Finally, you create and perform operational procedures such as backup, recovery, and log capture, for administering the day-to-day operation of your site. Figure 17.1 shows a high-level view of the management process.
Figure 17.1 High-level view of the site management process
This chapter describes how to:
Perform a site checkup
Monitor and analyze log data
Set up and perform the operational procedures necessary to manage your site
In addition, the Management section of this book contains the chapters listed in the following table.
Chapter |
Title |
Description |
---|---|---|
18 |
Problem Management |
Best practices for managing problems and troubleshooting your Microsoft Commerce Server 2000 site |
19 |
Maximizing Performance |
Methods for creating site usage profiles, and analyzing and managing site performance |
Performing a Site Checkup
E-commerce sites are dynamic and the requirements for a successful site can change dramatically over time. For example, your product line might change, site visitor usage profiles change, and from time to time you need to introduce new software, new catalogs, and other new content. Over time, the impact of the changes can affect the stability of your site.
It is a good idea to periodically conduct a site "checkup" to make sure that everything is working properly. A good time to conduct the checkup is prior to gearing up for the holiday shopping season, to give you time to correct any problems that might have crept into your site during the previous year, and to be sure that your customers have the best possible shopping experience.
Your site checkup should be a collaborative process, involving your system administrators, development staff, and business management. If other companies are developing or managing your site, you should include them, as well. The questions in the following table can provide guidelines for the types of questions you need to ask during the checkup. You might also have additional questions specific to your site.
Category |
Questions |
Comments |
---|---|---|
Backup and maintenance |
· What procedures do we have for rebuilding services? |
Regular backup and maintenance procedures ensure that you can identify and access all parts of your site, if necessary. Reconstructing a site is an effective way to make sure all the parts are available. |
Event logs |
· What warnings and errors have been occurring in our system and application event logs? |
Event logs provide a useful gauge for the health of your system. |
Load |
· What is our current site load for the following: |
Carefully managing site load is an effective means of improving site stability. Many sites experience large increases in traffic during the holiday season or as a result of advertising campaigns. Simulating the projected load is an effective way to make sure your site can handle the increased load. |
Security |
· When was our most recent security audit? |
Security requires constant vigilance. |
Software changes |
· When was the last version or service pack applied? |
Software upgrades, no matter how simple, introduce change and threaten system stability. Over time, software can drift from the specified configuration, making it difficult to identify problems and impossible to rebuild a server for debugging purposes. |
Software problems |
· How many software problems have been reported since the last version or service pack was installed? |
Problem counts are an effective way to track site quality. Although there will always be problems, a stable site should show a decreasing problem find rate. |
Stability and availability |
· How stable is our site, on a scale of 1 to 5? |
Eliminating single points of failure can dramatically increase site availability. For suggestions for eliminating single points of failure, see Chapter 6, "Planning for Reliability and High Availability." |
Support |
· Is our staffing and problem escalation planning adequate for our availability requirements? |
The ability to efficiently report and escalate problems to knowledgeable sources is important for ensuring a healthy site. |
Monitoring and Analyzing Log Data
For very serious errors and events, you should install an automated alarm system that continuously monitors your log files and sends notification (for example, an e-mail or pager message) when a particular type of error or event occurs. Your contingency plans should include notification and action scenarios. For more information about developing a contingency plan, see Chapter 14, "Deploying Your Site."
An alarm system scans for predefined errors and events by continuously monitoring the data written to any log files that you specify. In addition to errors and events, alarm systems can check for highs and lows in performance counters. You can configure an alarm system by setting priorities and responses for error, event, and failure information.
Your alarm system should respond to:
Backups that fail
System resources that become dangerously low
Services that stop unexpectedly
Events or system states that can affect site functionality
In addition to sending e-mail and pager messages, your alarm system might log the event or notification in a special file, run a script or program to correct the problem (for example, restarting a service), or log ancillary data to help you troubleshoot the error. For more information about system monitoring, see Chapter 19, "Maximizing Performance."
Analyzing Log Data
The log files created by your software applications record site usage, operational events, and performance data, as well as errors and warnings. Log files store the history of events within a system, and are often the only way to detect and trace an intrusion by a hacker. You can use the data in log files to diagnose server problems, to track the number of users who visit your site so that you can plan for expansion, and to know which pages of your site are the most popular. You should capture and analyze log data on a regular basis to evaluate the health of your system.
Logs can contain vast amounts of data, so it is important to identify which information is valuable and configure the log files to record only that information. Some applications create a new log file at the beginning of each day. Other applications begin to delete older data or start a new log file when the logs reach a specified size.
It is also important to keep track of which log files contain what data, and where each log file is located, to facilitate analysis. Configure your system to maintain the log files that provide the information you need. Be sure to consider the size of the log files, and archive or delete them often enough to prevent the files from becoming too large. You should design a methodology for analyzing log file data that includes the following:
A list of the log files to be analyzed
The frequency of analysis
The data to be analyzed
A distribution list for the resulting reports
A schedule for archiving report data and removing it from your site databases
Start by identifying which applications are necessary to site operation, and then gather the available log files from those applications and analyze their contents. Answers to the following questions will help you design your analysis methodology:
What information do we want to analyze?
Which log files contain that information?
How large are the log files we want to analyze?
How many log files does each server have?
How many servers do we have?
Where are the log files stored on each server?
What reporting application can produce log file storage information?
What input does the reporting application require to produce that information?
What application should we use to transform the raw log file data into input for the reporting application?
How often should we capture and analyze each log file?
You must decide whether you want to manage access to the data by having a central team create requested reports, or by having each interested team access the data and create their own reports. You must also decide how often to capture the log files, based on the nature and the volume of the data they contain.
For example, some log files contain data such as error messages and event notifications that is critical to preventing system failures. Analyzing this data after the system fails can provide clues to the reason for the failure. Access to this data at the time of the event or error might even help you prevent the system from failing.
You can use Commerce Server to import the contents of the Web log files into the Commerce Server Data Warehouse. Then you can use the Analysis modules in Commerce Server Business Desk to run reports. For more information, see "Business Desk Analysis" in Commerce Server 2000 Help.
In addition to your server log files, Commerce Server creates the following log files:
Pup.log (created by Commerce Server Site Packager)
Setup.log (created by Commerce Server Setup)
Debug.log (records all actions for the Profile Designer module in Business Desk)
Basket.log, Total.log, and Checkout.log (created each time a pipeline is used)
Important You should use the pipeline log files (Basket.log, Total.log, and Checkout.log) only for debugging purposes. You should not use them in your production environment because they can significantly slow pipeline execution. In addition, they might log and expose sensitive information, such as credit card numbers. Because they are intended only for debugging purposes, they are not thread-safe.
E-mail messages sent out by Commerce Server Direct Mailer are logged in a file in the folder you specify in the Direct Mail Properties dialog box. Direct Mailer creates a new log file every day to log direct mail activities (service starts and stops, jobs processed, and so on). The default location for the Direct Mailer log files is c:\winnt\system32\logfiles\, but you can change that location, if necessary, using Commerce Server Manager.
You can use the advanced features of the Web log file import process to modify log file data to provide the information you want to analyze. To do this, you can set the following properties for the imported data:
Default files. Identify different versions of the path into your Web site so that the hit counts for unique visitors to your site are accurate. For example, visitors entering your site using http://www.contoso.tld/ and http://www.contoso.tld/index.htm, should be counted as hits on the same page.
Excludes. Prevent the following data from being imported into the Data Warehouse: hits from specific hosts, requests for specific file types or expressions, and hits by crawlers. For example, exclude hits on your Web site by users within your corporation from being counted.
Inferences. Customize the assumptions made during import about users and visits.
Log Files. Customize the response to time overlaps in log files.
Query strings. Import Web site query strings, so that you can analyze the data associated with them.
For more information about setting these properties, see "Running the Data Warehouse" in Commerce Server 2000 Help.
Reports
After you import the Web log files into the Data Warehouse, you use the Analysis reports from Business Desk to analyze the data. When you design your analysis strategy, you need to know who will use the data and how they will use it. The following table lists the different ways in which team members might use log file data.
Group |
Statistics analyzed |
Purpose |
---|---|---|
Application developers |
Application errors and warnings |
· Monitor the health of the application |
Marketing |
Usage statistics |
· Identify customer demographics |
Site architects |
· Usage statistics |
Plan for expansion, better performance, and increased availability |
Web designers |
Usage statistics |
Improve the user interface (UI) and site functionality |
Commerce Server provides a variety of reports that show log file and Web site data in useful formats. For information about the reports shipped with Commerce Server, see "Business Desk Analysis" in the "Working with Business Desk" section in Commerce Server 2000 Help. For information about creating custom Commerce Server reports, see "Creating Custom Reports" in the "Extending Commerce Server" section in Commerce Server 2000 Help.
In general, you should create and use two types of reports:
Analytical reports. To analyze the performance of your site and evaluate the success of marketing and content.
Error/warning reports. To identify errors and failures occurring in the system.
When you create your reports strategy, you should use a tool to convert the data from your log files into useful information. The Data Warehouse performs the conversion as part of the log file import process. The process of converting the data is often referred to as performing aggregations and summations. This process changes raw data collected in the log files into useful information and provides some interpretation of the results.
For example, a Web server log file containing 1,000 separate hits for a page is imported into the Data Warehouse. In the Data Warehouse, the hits are totaled and the results are reported so that you know the page had a total of 1,000 hits.
Setting Up and Performing Operational Procedures
Site management includes the following:
Continuous monitoring and routine administrative maintenance, including log file analysis and site backup
Periodic administration, including upgrading hardware and software to improve performance, log archiving, and planning for expansion
Contingency planning and management, including preparing for calamities such as power outages, earthquakes, fire damage, security breaches, hardware and software failures, and the loss of key personnel
If your goal is to have your Commerce Server site continuously available, you must monitor your site constantly for system failure and for events that necessitate immediate intervention. For more information about setting up operational procedures, see the chapters listed in the following table.
Chapter |
Title |
Description |
---|---|---|
5 |
Planning for Scalability |
Various ways of scaling your site to increase site capacity |
6 |
Planning for Reliability and High Availability |
Various techniques for protecting your site from outages |
14 |
Deploying Your Site |
How to do contingency planning for your site |
19 |
Maximizing Performance |
How to monitor the performance of your system |
Creating a Site Administration Plan
You should set up an administration plan in which you assign monitoring, analysis, and administrative responsibilities. For example, you might decide to form a technology team to respond to system errors and to improve site performance. Or, you might decide to form a marketing team to respond to customers and analyze site usage to improve the commercial success of your site.
Your site administration plan is based on your site development, testing, and contingency planning efforts, and should contain the sections listed in the following table.
Title |
Contains procedures for |
---|---|
Site Administration |
· Performing routine site maintenance and backups |
Problem Management |
Mitigating system hardware and software problems. For information about creating a problem management plan, see Chapter 18, "Problem Management." |
System Monitoring |
Monitoring the health of your system to alert you to system failures. For information about system monitoring, see "Monitoring System Health" in Chapter 18. |
Site Documentation |
Maintaining complete documentation for your site hardware, software, and content. For information about site documentation, see "Documenting Your Site" in Chapter 19, "Maximizing Performance." |
Traffic Analysis |
Analyzing site traffic, to get key information such as the number of users visiting your site concurrently. For information about analyzing site traffic, see "Analyzing Traffic" in Chapter 19. |
Performance Measuring |
Measuring site performance, to identify bottlenecks that indicate the need to increase the capacity of the software or hardware running your site. For information about measuring site performance, see "Measuring Performance" in Chapter 19. |
Your site administration plan should answer the following questions:
How should we respond when we receive an alert?
Who is responsible for performing site backups and where should we store the media?
What should we do if hardware fails?
How can we ensure that site upgrades do not interrupt functionality?
Are we hosting our site internally? If not, what does our hosting provider take care of?
Which teams are responsible for what tasks?
Where are the handover points between teams?
How are we tracking changes?
Who is updating site documentation and what should be documented?
What is our procedure for documenting events and subsequent actions?
What tools should we use to monitor, notify, track, and report events?
What site data should we plan to analyze? How should we use the data?
Do we have a plan for manual backup in case of system problems? (For example, if a server malfunctions while a customer is placing an order, do we provide a telephone number on the Web page so that the customer can call and speak to customer service to complete the order?)
You also need to consider whether to administer your site locally or remotely. If you are using a hosting provider, it is especially important to understand what administrative services the hosting provider will perform.
Your administration plan should also include growth and upgrade scenarios, maintenance schedules and procedures, and a schedule of daily, weekly, monthly, and as-needed activities for administering your site.
Creating and Performing Operational Procedures
You should create a schedule of daily, weekly, monthly, and as-needed activities for operating your site.
Daily activities might include the following:
Check logs (server event logs, router logs, and firewall logs) and fix problems, as necessary
Maintain accounts, directories, shares, and security groups
Monitor Web traffic for indications of attacks and plug security holes
Perform and verify backups
Visually inspect indicator lights on servers and hubs
Check available space on all servers
Verify that all services on all servers are running
Ensure that anti-virus software is up-to-date
Monitor replication
Monitor performance
Monitor network traffic
Check print queues
Keep a maintenance log
Monitor the load on the database server
Weekly activities might include the following:
Clean servers
Produce reports on the week's activity
Update software, as necessary
Audit the network for unauthorized changes
Monthly activities might include the following:
Rebuild databases, if needed
Produce reports on activity for the month
Change passwords
Manage off-site storage of backup media
Perform a system vulnerability analysis
Initial or as-needed activities might include the following:
Practice recovering from disaster
Document the full network
Rebuild corrupt servers
Test the recovery procedure
Get a performance baseline
The actions listed are just a starting point. There are many more actions you can add to these lists to ensure that your site operates at an optimal level.
Managing Security
Managing security includes activities designed to maintain, improve, and restore (when necessary) the security of your site. Security plays a critical role in the success of an online e-commerce site. You must be able to protect the interests and confidential information of both your business and your customers.
As part of managing security, you should:
Monitor your site for security breaches and holes.
Maintain the most current anti-virus protection.
Constantly research industry security issues, product reviews, and threats.
In addition to monitoring for external threats, you must also guard against internal threats by controlling and monitoring the number of individuals inside your organization who have administrative permissions to your Web site servers.
Managing Changes
You must create procedures for implementing requested features and changes on your site. You should always implement changes in a test environment and thoroughly test any changes you make before moving them to your production site. You should also update your site documentation with information about any changes you make. For more information about setting up a process for managing changes, see "Managing Change" in Chapter 8, "Developing Your Site."
Backing Up and Restoring Site Data
Your requirements for the availability of site data determine the content of your site backups. Performing a daily backup of your site is critical. For maximum security, store backups offsite in a secure fire- and water-proof environment. Your backup strategy should specify the following:
Type and frequency of backups
Hardware and software to use to perform backups
Type of media to use for backups
The frequency with which you should recycle the media
Secure location (onsite and offsite) in which to store the backups
A method for managing the security of the backup location
To ensure reliable recovery of your site in case of disaster, you must thoroughly test your backup and recovery procedures. Test different failure scenarios to be sure that you can recover quickly from different types and severities of failures.
To develop a successful backup and recovery plan, you must identify the data that is critical to your business and know the frequency with which it changes. Many of your decisions should be driven by data availability, the financial cost of your site being inaccessible, whether or not you can recreate lost data, the size and type of the data to be backed up, and the complexity of your site.
You also need to determine whether to perform full site backups or to back up site components individually. Assuming that you can re-create the architecture of your site, you should back up the following:
Commerce Server databases, including the Administration database and the Data Warehouse
All content, including Active Server Pages (ASP), dynamic-link library (DLL), Graphics Interchange Format (GIF), and Hypertext Markup Language (HTML) files
Web site log files, especially if you are actively analyzing site traffic data
You should also back up metadata and registry information, and other site architecture and implementation information. The following practices can help you reduce the time it takes to recover your data after a disaster:
Use archiving to reduce the size of your Commerce Server databases. Archiving enables you to retain historical data, yet clear space in your site databases.
Use multiple backup devices simultaneously.
Use a combination of full-database, differential-database, and transaction-log backups to minimize the number of backups that must be applied at the point of failure.
Use file and file-group backups and transaction log backups. Back up only those files that contain relevant data.
Use snapshot backups to minimize or eliminate the use of server resources in the backup process. (Snapshot backups require third-party hardware and software.)
For more information about these practices, see "Backing Up and Restoring Databases" and "Archiving and Restoring Databases" in SQL Server Books Online.
For more information about tools and techniques for backing up and restoring Commerce Server, see "Backing Up and Restoring Commerce Server" and "Backing Up and Restoring a SQL Server Database" in Commerce Server 2000 Help. Also, see "Backing Up Your Site" in Chapter 14, "Deploying Your Site."