ETL Process for the Web Server Log Import DTS Task

The Web server log import DTS task imports Web log file data generated by your Web site into the Data Warehouse.

The configuration of your application and Web servers will affect the log file data imported during the Web server log import DTS task. Because log files can contain information for more than one site you may need to parse the log files on a per site or virtual server level. Therefore, each hit in the log file must be associated with a particular virtual server. The Web server log import task can be configured to filter out the appropriate virtual servers. The virtual server is identified by three properties: port number, IP address, and virtual server key. Because the secure port number (port 443) and the non-secure port (port 80) both effect the virtual server key, they are seen as two different virtual servers, and the Web server log import exclusions must be set up appropriately to include both virtual servers in a Web log import. For more information on filtering log files for the log import, see Setting Web Log File Exclusions.

Ee797037.note(en-US,CS.20).gifNotes

  • Not all lines of the log file will be imported to the Request table because many lines in the log file are caused by hits, not actual requests. Only requests are imported to the Request table. A series of filters is used to distinguish hits from requests in the log file. For more information about selecting information to exclude from the Web server log import to the Request table, see Setting Web Log File Exclusions.
  • The time overlap setting for the Web server log import DTS task will effect the requests imported to the Request table at the beginning and end of a log file. By default, the Web server log import DTS task will discard requests that overlap another log file by 30 minutes. For information about changing the time overlap settings, see Setting Time Overlap and Log File Rotation Properties.

The following tables list the columns extracted from the W3C log file, the transformation made, and the tables the data is saved to in the Data Warehouse.

Source columns from the W3C log file Transformation Target columns from the Request table in the Data Warehouse
N/A
Generated

Generated by the log import application. Incremented for each request line of a log file that gets stored. This is unique across different imports.

RequestNum
cs-uri-query
Copy Column

Contains the entire query string from the log file.

QueryString
cs-uri-stem
Copy Column
Uri
cs-bytes
Copy Column

Client to server bytes.

BytesReceived
sc-bytes
Copy Column

Server to client bytes.

BytesSent
N/A
Generated

This column is generated within the log import application as a unique identifier.

VisitNum
date and time
Concatenation

Convert to local time.

TimeStamp
N/A
Generated

This is the key member of the SiteURL class. The OLE DB provider will resolve this to SiteURLID in the physical SiteURL table.

URL
N/A
Generated

Determines if this entry is considered a Request (versus a Hit).

This can be set to false when the hit is filtered. The most common filtered hit is when an image is requested and it has a Commerce Event attached to it.

IsRequest
cs(Referer)
Copy Column
ReferrerURL
cs(Referer)
Middle of String

Takes only the domain portion of the URI.

ReferrerDomainName
cs-uri-query (UPM cookieless)
OR
cs(cookie)
OR
cs-username
OR
cs-ip + cs(User-Agent)
Generated

The user key is the first 8 bytes of an MD4 hash of whatever is used as the user key. What is used as the user key depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address and browser agent string.

UserKey
sc-win32-status
Copy Column
Win32Status
sc-status
Copy Column
HttpStatus
N/A
Generated

Log import will match a log line to a configured server binding (including the port number (s-port) and IP address (s-ip)). If the server binding is classified as being secure, this column will be set to true. For more information, see Configuring an Application.

IsSecure
cs-method
Generated

GET = 1, POST=2, HEAD=3

Method
time-taken
Copy Column
TimeTaken
N/A
Generated

The first request of a visit detected within the log import is 1, and successively gets incremented for each subsequent request in the visit.

RequestIndex
N/A
Generated

If log import detects there is no Commerce Event query string on this line, this will be set to false.

HasCommerceEvent
N/A
Generated

This is incremented for each instance of a log import.

TaskID
cs-uri-stem
Middle of String

Takes only the first level of the URI, if available.

Level1Dir
cs-uri-stem
Middle of String

Takes only the second level of the URI, if available.

Level2Dir
cs-uri-stem
Middle of String

Takes only the third level of the URI, if available.

Level3Dir
cs-uri-stem
Middle of String

Takes only the fourth level of the URI, if available.

Level4Dir
cs-uri-stem
Middle of String

Takes only the fifth level of the URI, if available.

Level5Dir
cs-uri-stem
Middle of String

Takes only the sixth level of the URI, if available.

Level6Dir
cs(User-Agent)
Copy Column
UserAgentName
UPM Cookie Key
Or 
cs(cookie)
Or
cs-ip + cs(User-Agent)
Generated

The UserId is the first 8 bytes of an MD4 hash of whatever is used as the user ID. What is used as the user ID depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address and browser agent string.

UserID
cs-uri-query
Copy Column

MD4 Hash

QueryStringKey
N/A
Generated

Used for check pointing support.

SubTaskID
cs-uri-stem
Copy Column

MD4 Hash

UriKey
cs(Cookie)
Middle of String

Only the matched cookie is extracted. The shipping schema does not physically store this property, although the schema can be logically extended to store it.

Cookie

Writing to the Visit Table

A visit begins when the referrer is not internal to the site. For example, if a user hits your site (<www.microsoft.com>), then clicks a link on your site that takes them to a different site (<www.msn.com>), and then returns to your site (<www.microsoft.com>), the Visit table will log two individual visits. The following table shows an example of what the log file for this user might resemble.

Time (in minutes) UserName URI Referrer
0 A /default.asp -
1 A /foo.asp <https://www.microsoft.com/default.asp>
5 A /default.asp <http://www.msn.com>

If the definition of a visit were based on time alone, this would be one visit. However, the Data Warehouse knows that the user left the site and came back by virtue of the referrer being external to the site. A substring match against the SiteURL table determines if the referrer is internal. The referrer is checked to be internal using a substring match against the SiteURL table.

Ee797037.note(en-US,CS.20).gifNote

Source columns from the W3C log file Transformation Target columns from the Visit table in the Data Warehouse
N/A
Generated

This column is generated within the log import application as a unique identifier.

This is a foreign key to the Visit table. It is also a foreign key to the VisitInfo table; however, the VisitInfo table is only inserted when the user's visit is complete.

VisitNum
date and time
Concatenation

This is the timestamp of the first request in the visit. Convert to Local Time.

Timestamp
cs(Referer)
Copy Column

Includes the query string.

ReferrerURL
cs(Referer)
Takes only the domain portion of the URI.
ReferrerDomainName
cs-uri-query (UPM cookieless)
OR
cs(cookie)
OR
cs-username
OR
cs-ip + cs(User-Agent)
Generated

The user key is the first 8 bytes of an MD4 hash of whatever is used as the user key. What is used as the user key depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address and browser agent string.

UserKey
   
UserDomainName
s-ip
Copy Column
IpAddress
N/A 
Generated

This is incremented for each instance of a log import.

TaskID
time
Middle of String
Hour
UPM Cookie Key
Or 
Custom Cookie or IP Address + Browser Agent String
Generated

MD4 Hash

UserID
   
DateCreated
cs-username
Copy Column
UserName
N/A
Generated

If the user has a cookie, this is set to True.

HasCookie
N/A
Generated

Used for check pointing support.

SubTaskID
cs(Cookie)
Takes only the matched cookies. This property is not physically stored. User can logically extend to store.
Cookie

The VisitInfo table represents closed user visits. This class is written from log import when the visit has closed. A visit is closed when an external referrer was encountered or a visit timeout occurred.

Source columns from the W3C log file Transformation Target columns from the VisitInfo table in the Data Warehouse
N/A
Generated
CountOfRequest
N/A
Generated
Duration
N/A
Generated
FirstTimeStamp
N/A
Generated
FirstHTimeStamp
N/A
Generated

This column is generated within the log import application as a unique identifier.

This is a foreign key to the Visit table. It is also a foreign key to the VisitInfo table; however, the VisitInfo table is only inserted when the user's visit is complete.

VisitNum
N/A
Generated

This is incremented for each instance of a log import.

TaskID
cs-uri-query (UPM cookieless)
OR
cs(cookie)
OR
cs-username
OR
cs-ip + cs(User-Agent)
Generated

The user key is the first 8 bytes of an MD4 hash of whatever is used as the user key. What is used as the user key depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address and browser agent string.

UserKey
cs(User-Agent)
Copy Column
UserAgentName
   
FirstRequestNum
   
SecondRequestNum
   
LastRequestNum
   
FirstUriKey
   
SecondUriKey
   
LastUriKey
cs-uri-query (UPM cookieless)
OR
cs(cookie)
OR
cs-username
OR
cs-ip + cs(User-Agent)
Generated

The user key is the first 8 bytes of an MD4 hash of whatever is used as the user ID. What is used as the user ID depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address and browser agent string.

UserID
cs(Referer)
Generated

Only the domain portion of the URI is loaded.

ReferrerDomainName
N/A
Generated

Used for check pointing support.

This is a foreign key to the LogImportSubTask table.

SubTaskID
Source columns from the W3C log file Transformation Target columns from the TaskHistory table in the Data Warehouse
N/A
Generated

This is incremented for each instance of a DTS import task.

TaskID
N/A
Generated

This identifies which import task was used.
1 = Web server log import
2 = Profile data import
3 = Campaign data import
4 = Product catalog import
5 = Transaction log import
8 = Configuration synchronization
9 = Data deletion
10 = IP resolution
11 = Report preparation
20 = Report caching

OperationID
N/A
Generated

Result of the import.
1 = Import Success
2 = Import Completed with Failure
3 = Import Incomplete.
4 = Import Aborted

Status
N/A
Generated

Physical start time of the import.

StartTime
N/A
Generated

Physical end time of the import.

EndTime
N/A
Generated

Description of the import task.

Description
N/A
Generated

Used for check pointing support for the Web log import DTS task.

SubTaskID
N/A
Generated

Used for restart for the Web log import DTS task.

NumOfTries

The LogImportSubTask class is used internally within the Web log import to track checkpoints for the restart feature.

Source columns from the W3C log file Transformation Target columns from the LogImportSubTask table in the Data Warehouse
N/A
Generated

Used for check pointing support.

SubTaskID
N/A
Generated

Incremented for each instance of a log import.

TaskID
N/A
Generated

A foreign key to the Site table.

SiteID
N/A
Generated

Server group name for which you are importing. There is a unique ServerGroup for each virtual server (for IIS, a distinct log file).

ServerGroup
N/A
Generated

A log file name within the import.

FileName
N/A
Generated

The offset within the log file where the checkpoint occurred.

ByteOffsetHigh
N/A
Generated

The offset within the log file where the checkpoint occurred.

ByteOffsetLow
N/A
Generated

The W3C fields directive at the time of the checkpoint.

Field
N/A
Generated

Current log file date and time when the checkpoint occurred.

Date

The OpenUserVisit class represents users that have active visits at the time of the end of a log import. To modify how open user visits are handled by the Web server log import DTS task you must adjust the log file rotation settings. For more information about changing log file rotation settings, see Setting Time Overlap and Log File Rotation Properties.

Source columns from the W3C log file Transformation Target columns from the OpenUserVisit table in the Data Warehouse
N/A
Generated

The count of requests in this current open visit.

CountOfRequest
N/A
Generated
FirstTimeStamp
N/A
Generated
LastTimeStamp
N/A
Generated

Generated within the log import application as a unique identifier.

VisitNum
cs-uri-query (UPM cookieless)
OR
cs(cookie)
OR
cs-username
OR
cs-ip + cs(User-Agent)
Generated

The user key is the first 8 bytes of an MD4 hash of whatever is used as the user key. What is used as the user key depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address and browser agent string.

UserKey
N/A
Generated

This is incremented for each instance of a log import.

TaskID
cs(User-Agent)
Copy Column
UserAgentName
N/A
Generated

Foreign key to the request table. The first request in this visit.

FirstRequestNum
N/A
Generated

Foreign key to the request table. The second request in this visit.

SecondRequestNum
N/A
Generated

Foreign key to the request table. The last request in this visit.

LastRequestNum
N/A
Generated

Foreign key to the URI table. The first URI in this visit.

FirstUriKey
N/A
Generated

Foreign key to the URI table. The second URI in this visit.

SecondUriKey
N/A
Generated

Foreign key to the URI table. The last URI in this visit.

LastUriKey
cs-uri-query (UPM cookieless)
OR
cs(cookie)
OR
cs-username
OR
cs-ip + cs(User-Agent)
Generated

The user ID is the first 8 bytes of an MD4 hash of whatever is used as the user key. What is used as the user ID depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address and browser agent string.

UserID
cs(Referer)
Middle of String

Only the domain portion of the URI is loaded.

ReferrerDomainName
N/A
Generated

Used for check pointing support.

SubTaskID
Source columns from the W3C log file Transformation Target columns from the SiteSummary table in the Data Warehouse
N/A
Generated

This is calculated during log import. All valid hits the parser was able to successfully parse from the log file.

TotalHits
N/A
Generated

Total number of hits that were not filtered out by exclude criteria.

TotalImportHits
N/A
Generated

Total number of visits (open and closed) that were encountered.

TotalVisits
N/A
Generated

Number of hits that were excluded by the server exclude criteria.

TotalBadServerCount
N/A
Generated

Number of hits that were excluded by not being able to match a configured site. If this number is high, check your application configuration. For more information about configuring your application, see Configuring an Application.

TotalBadSiteCount
N/A
Generated

Number of hits that were excluded by the excluded host criteria.

TotalExcludeHostCount
N/A
Generated

Total number of hits that were excluded by all criteria.

TotalExcludeCriteriaCount
N/A
Generated

Number of hits that were excluded by the crawler exclude criteria.

TotalMatchCrawlerCount
N/A
Generated

Physical start time of the import.

StartTime
N/A
Generated

Physical end time of the import.

EndTime
Source columns from the W3C log file Transformation Target columns from the LevelDir table in the Data Warehouse
N/A
Generated

First 8-bytes of MD4 Hash

URI1Key
N/A
Generated

First 8-bytes of MD4 Hash

URI2Key
N/A
Generated

First 8-bytes of MD4 Hash

URI3Key
N/A
Generated

First 8-bytes of MD4 Hash

URI4Key
N/A
Generated

First 8-bytes of MD4 Hash

URI5Key
N/A
Generated

First 8-bytes of MD4 Hash

URI6Key
Source columns from the W3C log file Transformation Target columns from the PathInfo table in the Data Warehouse
N/A
Generated
FirstTimeStamp
   
PathKey
   
PathTypeEnum
N/A
Generated

This is incremented for each instance of a log import.

TaskID
Source columns from the W3C log file Transformation Target columns from the LogUser table in the Data Warehouse
cs-uri-query (UPM cookieless)
OR
cs(cookie)
OR
cs-username
OR
cs-ip + cs(User-Agent)
Generated

The user key is the first 8 bytes of a MD4 hash of whatever is used as the user key. What is used as the user key depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address plus browser agent string.

UserKey
   
UserDomainName
   
DateCreated
cs-username
Copy Column
UserName
N/A
Generated

If the user has a cookie, this is set to True.

HasCookie
s-ip
Copy Column
IpAddress
cs-uri-query (UPM cookieless)
OR
cs(cookie)
OR
cs-username
OR
cs-ip + cs(User-Agent)
Generated

The user key is the first 8 bytes of a MD4 hash of whatever is used as the user key. What is used as the user key depends on what is in the log file, it can be the UPM query string (cookieless), UPM cookie, user configured cookie, user name, or client IP address plus browser agent string.

UserID
Source columns from the W3C log file Transformation Target columns from the URI table in the Data Warehouse
cs-uri-stem
Copy Column
URI
N/A
Generated

This is the home page URL that was matched during log import.
This is the first specified URL.

URL
cs-uri-stem
Middle of String

Takes only the first level of the URI, if available.

Level1Dir
cs-uri-stem
Middle of String

Takes only the second level of the URI (if one exists).

Level2Dir
cs-uri-stem
Middle of String

Takes only the third level of the URI (if one exists).

Level3Dir
cs-uri-stem
Middle of String

Takes only the fourth level of the URI (if one exists).

Level4Dir
cs-uri-stem
Middle of String

Takes only the fifth level of the URI (if one exists).

Level5Dir
cs-uri-stem
Middle of String

Takes only the sixth level of the URI (if one exists).

Level6Dir
cs-uri-stem
Copy Column

MD4 Hash

UriKey

See Also

Configuring the Default Web Site Properties for Import

Configuring the Web Server Log Import DTS Task

Running the Web Server Log Import DTS Task

Running the Web Server Log Import DTS Task for a Large Site

Scripting the Web Server Log Import DTS Task

Workflow for Running DTS Tasks

Extending the Data Warehouse

Data Warehouse Schema

Copyright © 2005 Microsoft Corporation.
All rights reserved.