Administering Usage Analysis with the Microsoft FrontPage 2002 Server Extensions

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Web site usage analysis is a new feature in Microsoft® FrontPage® 2002 that provides Web site authors with a variety of usage statistics. This feature relies on server side processing of the Web server's log files to compile these statistics. The focus of this white paper is on setup and administration of the server side log file processing. It is assumed the reader is familiar with Microsoft FrontPage 2002 Server Extensions or SharePoint Team Services from Microsoft administration.

On This Page

General Information
About Usage Data
About Log Fields
Server Configuration
Scheduling Usage Processing on UNIX

General Information

Setup

On the Windows platform, usage processing is configured properly out of the box. However, there are some recommended IIS configuration changes necessary to provide a better end user experience. On the UNIX platform, you need to take several steps to configure your system for usage processing.

If you are looking for quick instructions on configuring your server (Windows or UNIX) for usage reports, you can just read the Server Configuration section below.

Background

The usage report data is generated by a process running on the Web server which analyzes the Web server log files. This process should be run on a regular schedule (daily or weekly) to parse the log files and update the FrontPage metadata with the latest usage information.

On the Windows platform, the FrontPage Server Extensions and SharePoint Team Services use the SharePoint Team Services Timer Service which is responsible for (among other things) launching the usage processing process. The Timer Service is a regular Microsoft Windows NT Service and can be administered through the Administration Pages or the owsadm.exe command line utility. The out of the box configuration of the Timer Service is set to run usage processing every Sunday at midnight.

On the UNIX platform you must use the cron facility to schedule usage processing. For more information, see the Apache 1.3.3+ section below.

You can also launch usage processing manually by running owsadm.exe –o usage

How usage analysis processing works

When usage processing is run, it follows a three-step process as outlined below.

  • Step 1 – Log file discovery

    In the log file discovery phase, a list of log files to process is built. On the first run, log files for the past two months are added to this list. Subsequent runs include only those files that have been created since the previous run.

  • Step 2 – Log file parsing

    Each log file discovered is parsed line by line. Hit counts are accumulated on a per-site and per-file basis. Some hits are ignored, including the following:

    • Any hits where the URI contains _vti_

    • Any hits with an HTTP Status of anything except 200 and 304

    • Any hits to virtual directories that do not contain a FrontPage subweb.

  • Step 3 – Storing data

    All the accumulated statistics in memory are written to disk into the FrontPage metadata one Web site at a time. Before being written, a backup file is created in case the write operation fails (for example, the web site might be locked for reading). The backup files are left on disk to be included on a later run of usage processing.

About Usage Data

As mentioned above, the usage analysis data is stored on disk in the FrontPage metadata. The data is divided into Web-level and file-level data. The following is a complete list of the usage statistics gathered, along with an explanation of each statistic.

Web-level Data

The Web-level data include:

Aggregated hit counts for all files in the site (found in service.cnf)

  • Visits per day / week / month. The visit statistics approximate the number of individuals who visit your site. So if a user browses to your Web site and views several files, it is all counted as one visit.

  • The calculation of this statistic depends on your server configuration. If the server logs the "referer" (sic) string (for a definition of the referer string, see the Referer information section), a visit is defined as any hit where the referer is external to the current Web site. For example, if your site is at https://example.microsoft.com/, all referers that do not start with https://example.microsoft.com/ are considered external.

    Note: The URL used for this compare ("https://example.microsoft.com") is obtained automatically from your server configuration. With certain configurations, however, it is not possible to obtain an accurate value. In this case, you will see URL's in the referrers report in FrontPage that are not external to your Web site. You can fix this by using the UsageServerUrls server property. For details, see the "Analyzing Web Site Usage" section of the SharePoint Team Services Administrator's Guide.

    If your server does not log the "referer" string, a visit is simply defined as any hit to the home page.

  • Page hits per day / week / month. These counts include hits to pages only. Hits to supporting files, such as .gif, .css, and .jpg files, are not counted.

  • Total hits per day / week / month. These counts include hits to all files including supporting files. Note that if a single HTML page displays 10 pictures, browsing to this page will increase the total hits by 11.

Kilobytes downloaded (found in service.cnf)

  • Tracks the total number of kilobytes downloaded for the web site on a per-month basis*. If your log files do not include the number of bytes sent, this number will always be zero.*

User agent information: (found in systems.xml and browsers.xml)

  • Tracks the types of platforms being used to browse to your web site. This data is determined by checking the user agent string found in the Web server log file. This string is sent by the browser software with each HTTP request. This data is gathered only for visits. With all other hits, the user agent string is ignored.

  • If the Operating System and Browser Reports are empty in FrontPage 2002, it is probably because your Web server is not configured to log the user-agent string. For more details, see the Log Fields section .

Referer information: (found in _x_referer.xml, _x_refdomains.xml and _x_searchstrings.xml)

Tracks the ways that users get to your Web site. The referer string is sent by the browser software with each HTTP request. It is the URL of the page the user came from to get to your page. If the referring URL is external to your Web site, usage processing will use it to accumulate data in the following lists:

  • The Referring URL list contains the top 25 external referers sorted by frequency and grouped by months.

  • The Referring Domains list contains the top 25 external referring domains sorted by frequency and grouped by months. For example, https://example.microsoft.com/page1.htm and https://example.microsoft.com/page2.htm would both accumulate the same item in the Referring Domain list.

  • The Search Terms list contains the top 25 search terms users queried a search engine with to find your site. FrontPage usage processing gathers this data by checking the referring domain against a list of major search engines (Yahoo, Lycos, Google, etc). If it matches, it checks the rest of the URL to attempt to parse out the search term.

Users: (found in _x_users.xml)

  • Tracks the individual users visiting your site on a monthly basis (only counted for visits, not all hits).

  • If your Web site allows anonymous access, the user information will be very limited. You can turn off anonymous access via the Web Site Administration pages.

Domain information: (found in _x_domains.xml)

  • Tracks the client domains visiting your site on a monthly basis (only counted for visits, not all hits).

  • FrontPage 2002 does not use this information. There is no 'Domains' report. You can examine this data manually or write your own application to work with it. If you do not want this information, make sure that your Web server log file does not contain the client domain.

File-level Data

Additionally, usage processing counts hits for every file on a daily, weekly, and monthly basis. Daily data is kept for the past eight days. Weekly data is kept for the past five weeks. Monthly data is kept for the number of months specified by the usageanalysislogexpiry setting.

About Log Fields

You can configure your Web server to log only the fields you are interested in. However, usage processing requires a few fields to be present. Other fields are optional; if they are missing some data will not be present in the FrontPage 2002 usage reports. The following is a list of log fields that affect usage processing. Fields absent from this list are ignored during usage processing. The first three fields are required for usage processing.

Field Name

Comments

Apache directive

Date/Time

The Date and Time fields are used to determine the day, week, and month of a particular hit. If the log file name contains the full date, the date/time is optional.
This field is required for usage processing.

%t

URI Stem

The URI-Stem is the part of the URL that appears after the domain name. It is used to determine which file and Web site a hit is assigned to.
This field is required for usage processing.

%U or %r

Protocol Status

The HTTP Status code is sent by the Web server to the browser to indicate the success or failure of the request. Usage processing requires this field so that only successful hits are counted.
This field is required for usage processing.

%s

User Name

The User Name field indicates the user that made the HTTP request. If the user name field is logged, usage processing will track the number of visits made by each user. Note that if your Web site allows anonymous access, very little user information will be obtained.

%u

Bytes Sent

The Bytes Sent field indicates the number of bytes sent from the Web server to the Web browser. If this field is logged, usage processing will track the total number of bytes sent for each Web site.

%b

User Agent

The User Agent string is sent by the browser and indicates what operating system and browser are being used to make the HTTP request. If this field is logged, usage processing will track the types of operating systems and browsers used to access your Web site.

%{User-Agent}i

Referer

The Referer string is sent by the browser with each request. It is the URL of the page that referred the request. Usage processing uses this field to calculate visits and to track the referring URL's, referring domains, and search terms.

%{Referer}i

Client IP Address

If the Client IP Address is logged, usage processing will track the IP addresses of the machines making HTTP requests to your Web site. This information is not displayed in FrontPage 2002, so typically you should not log this field.

%h

Server Configuration

Windows 2000 / IIS5

The following instructions outline how to configure IIS to log only the information used by usage analysis.

  1. In Internet Services Manager, right click Default Web Site and then click Properties.

  2. Check the Enable Logging check box.

  3. Set the Active log format to W3C Extended Log File Format.

  4. Click Properties for the log format.

  5. Click the Extended Properties tab.

  6. Enable the following items: Time, User Name, URI Stem, Protocol Status, Bytes Sent, User Agent, and Referer.

    Note: The process accounting log fields in IIS5 disrupt usage processing. You should turn off all fields under the "Process Accounting" section. (This may be fixed in a service release)

  7. By default, usage processing is set to run weekly. You may want to change this to daily or monthly with the Web Site Administration Pages for your site. The usage processing configuration page is at https://<server>/_vti_bin/_vti_adm/fpadmdll.dll?page=usage.htm

Apache 1.3.3+

Usage processing with Apache has the following server configuration requirements.

  • **Log format:**The Apache log format must have each field separated by characters that do not appear in the field itself. For instance, the User Agent string can be separated by quotes but not spaces because the string itself frequently contains spaces. The "combined" log format is recommended. You can use the common log format but less usage information will be available.

  • **Log Rolling:**It is recommended that you use the cronolog facility to roll your Apache logs. Usage processing will not work properly with rotatelogs because the LogFileLocation parameter does not have a seconds format specifier. The following instructions outline how to configure your Apache server roll logs using cronolog:

Example Apache Logging Configuration:

  1. Download, compile, and install cronolog according to the cronolog instructions.

  2. Add the following lines to the httpd.conf file for your Web server:

LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User- Agent}i"" combined CustomLog "|<path to cronolog> <path to log directory>/access%m-%d- %y.log" combined

  1. Restart your Web server.

  2. Run the following: /usr/local/frontpage/version5.0/bin/owsadm.exe –o setproperty –pn logfilelocation –pv "<path to log directory>/access%2m-%2d-%y.log"

  3. Run the following: /usr/local/frontpage/version5.0/bin/owsadm.exe –o setproperty –pn logrollover –pv "daily at 00:00"

Scheduling Usage Processing on UNIX

You should use the UNIX cron facility to schedule regular processing of your Web server log files. You need to schedule a job as root to execute /usr/local/frontpage/version5.0/bin/owsadm.exe –o usage [-m host-header –p port].

If you have multiple Web servers on the same machine and want usage processing to run at the same time for all of them, you will need to build a script that loops through all Web servers. Here is an example of such a script written in Perl:

#!/usr/local/bin/perl 
# This sample code is provided \"as is\" without warranty of any kind. 
You # are soley responsible for your use of this sample code and for any 
# results your use of this sample code. 
opendir(FPDIR, "/usr/local/frontpage") or 
die("Cannot open /usr/local/frontpage"); 
my @cnfFiles = grep /\.cnf$/, readdir FPDIR; 
foreach $_ (@cnfFiles) 
{ 
if (/(we|.*\:)(\d+)\.cnf/) 
{ 
$hostHeader = ''; 
if ($1 ne 'we') 
{ 
chop ($hostHeader = $1); 
} 
$port = $2; 
LaunchUsage($port, $hostHeader); 
} 
} 
closedir FPDIR; 
sub LaunchUsage 
{ 
my ($port, $hostHeader) = @_; 
my $cmdLine = "/usr/local/frontpage/version5.0/bin/owsadm.exe -o usage -p $port"; 
$cmdLine .= " -m $hostHeader" if ($hostHeader); 
print `$cmdLine`; 
}