crawler.exe reference

Article
07/22/2014

Applies to: FAST Search Server 2010

Use the crawler binary to start a FAST Search Web crawler. You can specify options to start the crawler as a stand-alone single node crawler or as part of a distributed multiple node crawler (as a multi-node scheduler or a node scheduler).

Note

To use a command-line tool, verify that you meet the following minimum requirements: You are a member of the FASTSearchAdministrators local group on the computer where FAST Search Server 2010 for SharePoint is installed.

Syntax

<FASTSearchFolder>\bin\crawler [options]

Parameters

Parameter	Description
<FASTSearchFolder>	The path of the folder where you have installed FAST Search Server 2010 for SharePoint, for example C:\FASTSearch.

No option is required; all are optional.

crawler basic options

Option	Value	Description
-h		Displays help information.
-v		Displays version information.
-P	[<hostname>:]<crawlerbaseport>	Specifies the crawler base port (XML-RPC interface). Use when you run several instances of the crawler on the same node. <hostname> sets the bind address for XML-RPC interfaces (optional); can be a host name or an IP address (some hosts have multiple IP addresses). <crawlerbaseport> sets the start of a port number range available to the crawler. Default: 13000
-d	<path>	Specifies the data storage directory. Use this option to store crawl data, runtime configuration, and logs in subdirectories in the specified directory. Default: If the <FASTSearchFolder> environment variable is set, the default path is <FASTSearchFolder>\data\crawler; otherwise the default path is data.
-f	<XML config file>	Specifies the crawl collection configuration(s). Use this option to specify the location of an XML file that contains one or more crawl collections. The crawler will parse the contents of this file, add or update the specified crawl collections, and start crawling.
-c	<number>	Specifies the number of site manager processes to start, equal to or less than the number of clusters defined in the crawl collection specification. For larger crawls, specify a process count that is equal to or greater than the number of your CPUs. A maximum of 8 processes is supported. Default: 2

crawler advanced options

Option	Value	Description
-D	<number>	Specifies the maximum DNS requests per second. The FAST Search Web crawler has a built-in DNS lookup facility that can communicate with one or more DNS servers to perform DNS lookups. Use this option to limit the number of DNS requests per second that the crawler sends to the DNS server(s). The DNS resolver will automatically decrease the lookup rate if it determines that the DNS server is unable to handle the current rate. The actual rates are reported in the collection statistics output. Default: 100 requests
-F	<file>	Specifies the crawler global configuration file. Use this option to specify the location of an XML file that contains the crawler's global configuration, which may contain default values for all command-line options. Many options can be specified both in the XML configuration file and the crawler command-line tool. Options specified in the command-line tool take precedence. At startup, the crawler will look for a default CrawlerGlobalDefaults.xml configuration file, located in the current directory or the <FASTSearchFolder>\etc folder.
-n		Shuts down the crawler when idle. The refresh setting in a crawl collection must be higher than the time that is required to crawl the complete collection. Default: disabled
-T		Enables profiling. Use for debugging only.
-t		Enables profiling using the hotshot module. Use for debugging only.

crawler logging options

Option	Value	Description
-L	<path>	Specifies the log storage directory. Use this option to store crawler specific logs in subdirectories of the specified directory. Default: If the %FASTSEARCH% environment variable is set, the default path is %FASTSEARCH%\var\log\crawler; otherwise the default path is data\log.
-q		Disables verbose logging. Use this option to log only CRITICAL, ERROR, and WARNING log messages.
-l	<log level>	Specifies the kind of information to log: debug verbose info warning error

-L

<path>

Specifies the log storage directory.

Use this option to store crawler specific logs in subdirectories of the specified directory.

Default: If the %FASTSEARCH% environment variable is set, the default path is %FASTSEARCH%\var\log\crawler; otherwise the default path is data\log.

-q

Disables verbose logging.

Use this option to log only CRITICAL, ERROR, and WARNING log messages.

-l

Specifies the kind of information to log:

debug
verbose
info
warning
error

crawler integration options

Option	Value	Description
-o		Enables FAST Search Server 2010 for SharePoint mode. Use this option when you run the crawler in a FAST Search Server 2010 for SharePoint setting.
-i		Ignores the configuration server. The crawler continues to run even if the configuration server is unreachable.

-o

Enables FAST Search Server 2010 for SharePoint mode.

Use this option when you run the crawler in a FAST Search Server 2010 for SharePoint setting.

-i

Ignores the configuration server. The crawler continues to run even if the configuration server is unreachable.

crawler multi-node options

Option	Value	Description
-U		Runs the FAST Search Web crawler as a multi-node scheduler in a multiple node setup. Use the -S option to connect node schedulers to the XML-RPC port.
-S	<multinode scheduler host>:<multinode scheduler port>	Starts the crawler as a node scheduler to a multi-node scheduler. Specifies the host name and port number of the multi-node scheduler. Example: crawler1.contoso.com:13000
-s		Enables survival mode in a distributed setup, keeping the node scheduler alive and trying to reconnect to the multi-node scheduler until a successful connection is made. This option only applies to the node scheduler.
-I	<node identifier>	Specifies a symbolic name for the crawler node. Use of this option is rare. In a multiple node crawler setup, each crawler node must have a unique symbolic name, used by crawl collection configurations to specify the crawler nodes included in a crawl. This option only applies to the node scheduler. The default value is auto generated and stored in the configuration database. If you use this option, you only have to specify an alternative value once: the first time that the crawler is started.

Examples

To start a stand-alone single node crawler at port 13000, follow this example:

<FASTSearchFolder>\bin\crawler -o -P 13000

This example starts a multi-node scheduler in a multiple node setup at port 13000:

<FASTSearchFolder>\bin\crawler -o -P 13000 -U

The following example starts a node scheduler in a multiple node setup on a different node connecting to the multinode scheduler:

<FASTSearchFolder>\bin\crawler -o -P 13000 -S crawler1.contoso.com:13000