Excludes

You can exclude certain types of data from your imports. By excluding certain data, you keep the size of your Data Warehouse manageable, and page requests are more accurately defined in your reports. You can exclude the following types of data:

  • Hosts. You can exclude data from users who originate from certain hosts (for example, employees or software testers). This feature is typically used to exclude internal hosts. However, you can specify any host for exclusion.

    The Data Warehouse will not import hits associated with hosts that you exclude. The hits associated with the excluded hosts are, however, included in some aggregation-only reports, such as Summary Hit Counts and Bandwidth Data.

    When you enter a host name to exclude, you must type a string that Commerce Server 2000 can use to search the log files. For example, www.microsoft.com will exclude information from users whose host is www.microsoft.com. Do not include spaces in the name you type.

  • File types. Even though one page in your Web site may include several files, you want to ensure that only one page request is counted rather than a hit for each file in a page. For example, if a page includes graphics that are stored in separate files but are logically part of the page, exclude the graphics files from the import for more accurate hit counts. You can set the Excludes property to exclude files by specifying the file name extensions of the types of files you want to exclude.

    By default, files with the extensions .gif, .jpg, .jpeg, .css, and .cdf are excluded from Commerce Server imports. You can also remove file extensions from the exclude list. The name you type is the string that Commerce Server will search for in the log files. You must type the extension exactly.

  • File expressions. In addition to excluding a file from being imported by its file extension, you can exclude a file by its file path. You can provide a complete file expression for an individual file, or you can use wildcard characters to exclude a group of files that share similar characters in their expressions.

    File expressions are not case sensitive. Do not include an underscore (_) or spaces. You must type the exact string for which you want Commerce Server to search.

    The following table lists the wildcard characters you can use in file expressions.

    Character Description Example
    * Matches any number of characters in a character string t* matches Test, total, and Terrific
    ? Matches any single alphanumeric character Te?t matches Test and text
  • Crawlers. Search engines rely on crawlers that pace through the World Wide Web checking for certain content, titles, key words, and so on. By default, crawlers are excluded from import to prevent hits by Internet search engines, robots, and any other user agent from being imported into the Data Warehouse. (These hits, however, are used in reports such as Summary Hit Counts and Bandwidth Data.)

    You can find a list of crawlers that are excluded by default in the Commerce Server root directory, in the file Crawler.ini. The default list includes the major crawlers that are excluded by the major Web auditing organizations. You can add new crawlers to the exclude list or restore crawlers for import; however, restoring crawlers may reduce your overall system performance.

The Excludes tab of the Advanced Web Log Import Properties dialog box provides a drop-down list from which you can choose the type of data you want to work with. Select the All option from the list if you want to see a list of all the exclusions.

By default, excluding data from import excludes that data for all applications running on your site. You can also exclude data from specific applications, while including similar data from other applications on your site.

For example, if you have two site applications, Application1 and Application2, and each application contains a directory named /Examples. You can choose to exclude the content in the /Examples directory for both applications, or you can specify that data should be excluded only from the /Examples directory of Application1. The content in the Application2 /Examples directory will continue to be imported.

To add an import exclusion

To delete an import exclusion

To add an import exclusion

  1. Expand Microsoft SQL Server, expand SQL Server Group, and then expand the server on which your Data Warehouse is installed.

  2. Right-click Data Transformation Services, and then click New Package.

    Alternatively, if you are changing an existing package, right-click Data Transformation Services, click All Tasks, and then select Open Package. Select the package you want to change, and then click Open.

  3. On the Task menu, click Web server log import (Commerce Server).

  4. In the Import Web Server Logs dialog box, click Advanced.

  5. In the Advanced Web Log Import Properties dialog box, on the Excludes tab, do the following:

    Use this To do this
    Include default crawler list for the exclude criteria Select this check box to prevent crawlers on the default crawler list from being imported.
    Add Click to open the Excludes dialog box to create a new exclude item.
  6. In the Excludes dialog box, do the following:

    Use this To do this
    Select exclude category Select the type of exclusion you want to add from the drop-down list.
    Name Type the string Commerce Server will search for in the log files. For example, type *.gif to exclude graphic files. Commerce Server will not import items that match the string you type. The string cannot contain spaces.
    All applications within a site Select this option if you want to exclude the item from all applications within a site.
    Selected applications within a site Select this option if you want to specify an application from which to exclude the item.
    Select all Select this option to exclude the data from all of the applications listed above the button.
    Deselect all Select this option to cancel the selection of all of the applications listed above the button.
  7. Click OK to close the Excludes dialog box and return to the Excludes tab of the Advanced Web Log Import Properties dialog box.

  8. Continue to add exclusions. When you are finished, click Apply, and then click OK to save your additions and close the dialog box.

To delete an import exclusion

  1. Expand Microsoft SQL Server, expand SQL Server Group, and then expand the server on which your Data Warehouse is installed.

  2. Right-click Data Transformation Services, and then click New Package.

    Alternatively, if you are changing an existing package, right-click Data Transformation Services, click All Tasks, and then select Open Package. Select the package you want to change, and then click Open.

  3. On the Task menu, click Web server log import (Commerce Server).

  4. In the Import Web Server Logs dialog box, click Advanced.

  5. In the Advanced Web Log Import Properties dialog box, on the Excludes tab, do the following:

    Use this To do this
    Exclude Type Select the exclude item you want to delete.
    Remove Remove the exclude item.
  6. Click Apply, and then click OK to close the dialog box.

The import exclusions are configured for the Web log file import process. You can continue to configure the Web log file import properties, or you can import data into the Data Warehouse. You must synchronize your site configuration with the Data Warehouse before you import data into the Data Warehouse.

See Also

Data Warehouse Components

Configuring Web Log File Import Properties

Importing Data into the Data Warehouse


All rights reserved.