crawleradmin.exe reference

 

Applies to: FAST Search Server 2010

Use the crawleradmin tool to configure, control, and monitor crawl collections. For example, use crawleradmin to add, update, or delete crawl collections; to suspend or resume content feeding; or to monitor a crawl in progress (using the FAST Search Web crawler). A crawl collection is a set of Web sites that are crawled using the same crawl configuration.

Note

To use a command-line tool, verify that you meet the following minimum requirements: You are a member of the FASTSearchAdministrators local group on the computer where FAST Search Server 2010 for SharePoint is installed.

Syntax

<FASTSearchFolder>\bin\crawleradmin [options]

Parameters

Parameter Description

<FASTSearchFolder>

The path of the folder where you have installed FAST Search Server 2010 for SharePoint, for example C:\FASTSearch.

crawleradmin general options

Option (and short name) Value Required Description

--crawlernode (-C)

<hostname>[:<port>]

No

Manages the FAST Search Web crawler at the specified host name and port (optional).

Default: localhost:13000

--offline (-o)

<configuration_directory>

No

Accesses databases that are locked by a running crawler so that crawleradmin can work directly on the databases (offline mode), not through the crawler administrative API.

The crawler must be stopped before issuing this command.

If you do not specify a configuration_directory for the crawler, the command uses the default configuration directory: FASTSEARCH\data\crawler\config

Or, if the FASTSEARCH environment variable is not set: data\config

You can use this option with:

  • -a

  • -d

  • -c

  • -q

  • -G

  • -f

  • -d

  • --getdata

  • --verifyuri

-l

<log-level>

No

Specifies the kind of information to log:

  • debug

  • verbose

  • info

  • warning

  • error

-h

No

Displays help information.

-v

No

Displays version information.

crawler configuration options

Option (and short name) Value Required Description

--addconfig (-f)

<Path_to_XML_file>

No

Adds or updates a crawl collection configuration(s) from the specified XML file.

--getcollection (-G)

<crawl_collection>

No

Outputs the XML configuration for the specified crawl collection name (defined in the crawler XML configuration file) to stdout and saves stdout to a file.

--delcollection (-d)

<crawl_collection>

No

Deletes a crawl collection and its stored content from the crawler.

--encrypt (-e)

<cachesize>

No

Encrypts a password for security in a crawl collection configuration file.

crawler control options

Option (and short name) Value Required Description

--suspendcollection (-s)

<crawl_collection>

No

Suspends crawling of a specified crawl collection. Feeding of items in the feed queue continues.

--resumecollection (-r)

<crawl_collection>

No

Resumes crawling of the specified crawl collection.

--suspendfeed

<crawl_collection>[:targets]

No

Suspends content feeding of a crawl collection.

Optionally specify a comma-separated list of feeding targets (symbolic names in the crawl collection configuration).

--resumefeed

<crawl_collection>[:targets]

No

Resumes content feeding of a crawl collection.

Optionally specify a comma-separated list of feeding targets (symbolic names in the crawl collection configuration).

--enable-refreshing-crawlmode

<crawl_collection>

No

Enables the refresh crawl mode for a crawl collection.

If enabled, the crawler only crawls/refreshes URIs that were previously crawled.

--disable-refreshing-crawlmode

<crawl_collection>

No

Disables the refresh crawl mode for a crawl collection, and resumes regular crawl mode.

URI submission, refetching, and refeeding options

Option (and short name) Value Required? Description

--adduri (-u)

<crawl_collection>:<URI>

No

Appends the specified URI to the crawl collection work queue.

Combine with the --force flag to prepend the URIs and crawl them immediately.

--addurifile

<crawl_collection>:<file>

No

Appends all URIs from the specified file to the crawl collection work queue.

Combine with the --force flag to prepend the URIs and crawl them immediately.

--refetch (-F)

<crawl_collection>

No

Forces a re-fetch of a crawl collection.

The crawler will delete all existing work queues, clear all caches, start a new crawl cycle, and put all known start URIs on the work queue. Does not increment the counter used for orphan detection (dbswitch).

--refetchuri (-F)

<crawl_collection>:<URI>

No

Forces a re-fetch of a URI in the specified crawl collection.

The URI needn't be previously crawled. However, it must fall within the include/exclude rules for the crawl collection.

This also triggers crawling of the site to which the URI belongs (unless the site was already crawled in this refresh period).

--refetchsite

<crawl_collection>:<URI>

No

Forces a re-fetch of the crawl site from a URI in the specified crawl collection.

--force

No

Assures that a URI is crawled immediately (potentially preempting active sites) when it is used with:

  • --adduri

  • --addurifile

  • --refetchuri

  • --refetchsite

--feed

No

Refeeds URIs to the content indexing process (regardless of item change status) when it is used with:

  • --refetchuri

  • --refetchsite

--refeedsite

<crawl_collection>:<crawl_site>

No

Refeeds all items in the crawler store to the content indexing process for a crawl site.

A <crawl_site> is a combination of a host name (may be fully qualified) and port. Together they uniquely identify a server which may host a Web server.

This is the same as running the command-line tool and parameter postprocess -R on a single site, but does not require stopping the crawler. You should limit the number of concurrent re-feeds at run time to prevent overloading the crawler.

The refed URIs are put in a high priority queue. Feeding will occur from both the high priority and regular priority queues at the same time. You may notice a delay before an item is searchable.

--refeeduri

<crawl_collection>:<URI>

No

Refeeds the specified URI from the crawler store to the indexing process.

See --refeedsite for more information.

--refeedprefix

<prefix>

No

Specifies a URI prefix (including scheme) that URIs must match for refeeding.

Use with --refeedsite.

--refeedtarget

<destination>:<crawl_collection>

No

Specifies a feeding destination and content collection for the --refeed option.

Preempting, quarantine, and deletion options

Option (and short name) Value Required? Description

--preemptsite (-p)

<crawl_collection>:<crawl_site>

No

Preempts crawling of a site for a specified crawl collection.

--quarantine

<crawl_collection>:<crawl_site>:<time>

No

Blocks a site from crawling for a specified number of seconds.

--unquarantine

<crawl_collection>:<crawl_site>

No

Removes a crawl site from quarantine for a specified crawl collection.

--deletesite

<crawl_collection>:<crawl_site>

No

Deletes a crawl site from the crawler for a specified crawl collection.

--deluri

<crawl_collection>:<URI>

No

Deletes a URI from a crawl collection.

--delurifile

<crawl_collection>:<file>

No

Deletes URIs from a crawl collection. <file> is a newline-separated file of URIs that should be removed from the crawl collection.

Statistics options

Option (and short name) Value Required? Description

--collstats (-q)

<crawl_collection>

No

Displays crawl statistics for a crawl collection.

--collstatsquiet (-Q)

<crawl_collection>

No

Displays a summary of crawl statistics for a crawl collection.

--statistics (-c)

No

Displays crawl statistics for all collections.

--sitestats

<crawl_collection>:<crawl_site>

No

Displays crawl statistics for a specified crawl site in a crawl collection.

--cycle

A number from 1 to n, or "all"

No

Displays statistics from a specific crawl refresh cycle when it is used with:

  • --collstats

  • --collstatsquiet

  • --statistics

  • --sitestats

Specify "all" to merge statistics from all refresh cycles.

Default: the current cycle

Monitoring options

Option (and short name) Value Required? Description

--status

No

Displays the status for all crawl collections, including the following:

  • Crawl collection name

  • Status (e.g., Idle)

  • Feeding status (e.g., Feeding)

  • Number of active sites

  • Number of stored items

  • Item Rate

--nodestatus

No

Displays the status (per node) for all crawl collections (information reported resembles the items in the --status option

--active (-a)

No

Displays all active crawl collection names.

--nummanagers (-n)

No

Displays the number of crawl sites currently being crawled.

--sitemanagerstatus (-S)

<id>

No

Displays the status for the specified site manager <id>.

Status includes a list of sites that are currently crawling at the specified site manager, and additional information such as the work queue size, pending URIs, and global cache statistics.

--numworkers (-N)

<id>

No

Displays the number of active sites for a site manager.

--sites (-t)

<id>

No

Lists sites currently being crawled by a site manager.

--starturistat

No

Displays the feeding status of start URI files.

Debugging options (for advanced users)

Option (and short name) Value Required? Description

--getlogin

<form URI>

No

Downloads a <form URI>, extracts logon information from the form, and produces an XML configuration from the form document.

--extractlinks

<URI>

No

Downloads a URI and extracts forward links from it.

--addheader

<http header string>

No

Adds additional HTTP headers to the HTTP requests when it is used with:

  • --getlogin

  • --extractlinks

--verifyuri

<crawl_collection>:<URI>

No

Verifies if a URI can be crawled for a crawl collection (based on the crawl collection configuration's include and exclude rules).

--getdata

<crawl_collection>:<URI>

No

Retrieves the downloaded content from the crawler store for a URI in one crawl collection.

--idn

<URI>

No

Gets an IDNA encoded version of a URI.

Examples

To add or update the crawler with a collection configuration:

<FASTSearchFolder>\bin\crawleradmin -f MyCrawlCollectionConfig.xml

To remove a collection from the crawler:

<FASTSearchFolder>\bin\crawleradmin -d MyCollection

To suspend crawling of a collection:

<FASTSearchFolder>\bin\crawleradmin --suspendcollection MyCollection

To add a URI to a collection for crawling:

<FASTSearchFolder>\bin\crawleradmin --adduri MyCollection:https://www.contoso.com/

To show statistics for a crawl collection:

<FASTSearchFolder>\bin\crawleradmin -q MyCollection

To show statistics for a specific site in a crawl collection:

<FASTSearchFolder>\bin\crawleradmin --sitestats MyCollection:www.contoso.com

To temporarily block the www.contoso.com site from being crawled for 1 hour (even if the site is currently being crawled):

<FASTSearchFolder>\bin\crawleradmin --quarantine MyCollection:www.contoso.com:3600

To show the status for all collections in the crawler:

<FASTSearchFolder>\bin\crawleradmin --status