beconfig.xml reference

 

Applies to: FAST Search Server 2010

Use beconfig.xml to configure options for the browser engine component in Microsoft FAST Search Server 2010 for SharePoint. For example, use beconfig.xml to alter browser engine cache sizes or time-out settings.

The browser engine reads the beconfig.xml file in <FASTSearchFolder>\etc on startup.

Customizing beconfig.xml

Note

To modify a configuration file, verify that you meet the following minimum requirements: You are a member of the FASTSearchAdministrators local group on the computer where FAST Search Server 2010 for SharePoint is installed.

Use a text editor (e.g. Notepad), not a general purpose XML editor, to change beconfig.xml.

To edit this file:

  1. Edit beconfig.xml in a text editor to specify settings. Use the existing file in <FASTSearchFolder>\etc\ as a starting point. Do not remove any attribute sections from the file.

  2. Run nctrl.exe restart browserengine to restart the browser engine process, with the new options.

beconfig.xml quick reference

The following table contains a list of the elements in beconfig.xml. These elements can appear in any order, but must occur inside other elements as specified in this table.

Element Description

<browserengine>

Identifies this as a browser engine configuration file.

<browser>

Specifies options for the virtual Web browser window. Can only occur inside a browserengine element.

<proxy>

Specifies options for the internal proxy server. Can only occur inside a browserengine element.

<process>

Specifies options that affect the processing of individual items. Can only occur inside a browserengine element.

<excludes>

Contains one or more regexp elements, which specify regular exception rules that are used to exclude particular URIs from processing. Can only occur inside a browserengine element.

<regexp>

Specifies a regular exclude rule. Can only occur inside an excludes attribute.

<pipeline>

Specifies the processing pipeline options, and the pipeline steps to be performed on each item that is processed. Contains one or more extractor elements. Can only occur inside a browserengine element.

<extractor>

Specifies an extractor. Must contain both a type and an assembly element, and may contain a parameters element. Can only occur inside a pipeline element.

Note

The list of extractors and their sub-elements, as provided in <FASTSearchFolder>\etc\beconfig.xml, must not be altered.

beconfig.xml file format

XML elements in beconfig.xml begin with < and end with />.

The basic element format is as follows:

<element_name [attribute_name="value"] [attribute_name="value"] … />

For example:

<process maxOperations="1000" maxMemoryMB="1024" timeout="300" />

Elements and attributes are case-sensitive. Attribute values must be enclosed in quotation marks (" ") and are not case-sensitive.

An element definition can span multiple lines. Spaces, carriage returns, line feeds, and tab characters are ignored in an element definition.For example:

<process
    maxOperations="1000"
    maxMemoryMB="1024"
    timeout="300"
/>

For long element definitions, position attributes on separate lines and use indentation to make the file easier to read.

The basic structure of the beconfig.xml file is as follows:

<?xml version="1.0"?>
<browserengine>
    <browser ... />
    <proxy ... />
    <process ... />
    <excludes>
        ...
    </excludes>
    <pipeline>
        ...
    </pipeline>
</browserengine>

Comments can be added anywhere and are delimited by <!-- and -->.

browserengine element

Top level element.

Attributes

None

browser element

This element specifies options to the embedded Web browser component within the browser engine. Use this element to adjust the Web page item loading time-out period. For example, increase the time-out value if Web pages frequently time out during item loading.

Attributes

Attribute Value Description

width

<pixels>

Web pages are rendered in an invisible Web browser window. This option specifies the width of this window in pixels.

Default: 1280

height

<pixels>

Specifies the height of the invisible Web browser window in pixels.

Default: 1024

visible

true|false

Makes the Web browser window visible during processing. Use for debugging only.

Makes the Web browser window invisible during processing.

Default: false

images

true|false

Specifies that the browser engine should load the images contained on Web pages. Use for debugging only.

Specifies that the browser engine should not load the images contained on Web pages.

Default: false

timeout

<seconds>

Specifies the time-out period, in seconds, for the browser engine to load the Web page being processed. If a Web page takes longer to load, it will be discarded.

This option does not account for the time taken to run the processing pipeline after loading is completed.

Default: 60

Example

<browser width="1280" height="1024" visible="false" images="false" timeout="60"/>

proxy element

This element specifies options for the internal Web proxy and memory cache used by the browser engine. Use this element to adjust the cache size and maximum age of JavaScripts in the cache.

Attributes

Attribute Value Description

maxsize

<bytes>

Specifies the maximum size of a single JavaScript that will be downloaded from the Web or the Web crawler. Items that exceed this threshold will be discarded.

Default: 10485760

timeout

<timeout>

Specifies the time-out period, in seconds, for any JavaScript or Web page downloaded from the Web or the Web crawler. If a download exceeds this time-out, it will be discarded.

Default: 60

cacheSize

<megabytes>

Specifies the maximum size of the JavaScript cache within the browser engine. It is used for keeping frequently used JavaScripts available without re-downloading them.

Default: 25

cacheTTL

<seconds>

Specifies the maximum age, in seconds, of JavaScripts in the cache before they are evicted. A JavaScript may be evicted earlier if the cache fills up.

Default: 3600

Example

<proxy maxsize="10485760" timeout="60" cacheSize="25" cacheTTL="3600"/>

process element

This element specifies options that relate to the processing of Web items in the browser engine. Use this element to adjust the maximum memory usage and the pipeline time-out period.

Attributes

Attribute Value Description

maxOperations

<operations>

Specifies the maximum number of Web pages to be processed before the browser engine automatically restarts. This is useful to handle potential memory leaks and stuck processing that may be caused by some Web pages.

Default: 1000

maxMemoryMB

<megabytes>

Specifies the maximum memory usage, in MB, before the browser engine automatically restarts. This is useful to handle potential memory leaks and stuck processing that may be caused by Web pages.

Default: 1024

timeout

<timeout>

Specifies the time-out period, in seconds, for extracting hyperlinks from any specific Web page. This time-out is required to handle cases in which, for example, a JavaScript prevents the processing pipeline from completing processing of a Web page.

Default: 300

Example

<process maxOperations="1000" maxMemoryMB="1024" timeout="300"/>

excludes element

This element specifies one or more regular expression rules used to prevent the download of specific JavaScript and cascading style sheet URIs. A typical use excludes known advertising scripts to speed up Web page processing and to prevent the scripts from appearing in the content index.

Attributes

None

Example

<excludes>
    <regexp value="http://ads\."/>
</excludes>

regexp element

This element specifies a single regular expression exclude rule and can only occur inside an excludes element. This element can occur multiple times.

Attributes

Attribute Value Description

value

<regexp>

Specifies a regular expression that is matched against all external JavaScript and cascading style sheet URIs discovered during processing the Web item. URIs matching the regular expression are not downloaded or included during Web page processing.

Default: See <FASTSearchFolder>\etc\beconfig.xml for the default value.

Example

See excludes element example.

pipeline element

This element specifies the set of extractors that are executed on each Web page during processing in the browser engine. An extractor performs a set of operations, such as extracting a certain kind of hyperlink or HTTP cookies, generating checksum and the final item HTML used for content indexing.

Attributes

Attribute Value Description

name

default

Specifies the name of the pipeline. Only a single pipeline is supported and the name must be "default".

maxFrameLevels

<levels>

Specifies the number of HTML frame levels to process. Normally this option is set to 1, which means that only the top level frame and its immediate child frames (the frameset) are processed.

Increasing this number will recursively process multiple frame sets.

Default: 1

timeout

<seconds>

Specifies the maximum time that the processing pipeline can run on a single Web page before it is stopped.

Increasing this value will decrease browser engine throughput, but can help reduce Web page processing timeouts. Decreasing the value may improve throughput at the expense of possibly more timeouts.

Default:300

iterations

1

Specifies the number of iterations to run the pipeline on each Web page. Only one iteration is supported.

abortOnFailure

true|false

Specifies that the processing of a Web page should be stopped if any single extractor fails.

Specifies that the processing of a Web page should continue even if some extractors fail. This may improve link extraction, but can (in the worst case) lead to partial items being sent to the content index.

default

true

Specifies that this pipeline is the default pipeline. Because only one pipeline is supported, this value must always be set to "true".

Example

<pipeline name="default" maxFrameLevels="1" timeout="180" iterations="1" abortOnFailure="true" default="true">
..
</pipeline>

extractor element

This element specifies a single extractor in the pipeline. The list of extractors as provided in <FASTSearchFolder>\etc\beconfig.xml must not be altered.