postprocess.exe reference

 

Applies to: FAST Search Server 2010

Use the postprocess tool to manually refeed items to content indexing for one or more crawl collections. Before submission, each item's unique checksum fingerprint is checked against the duplicate database (unless duplicate detection is turned off).

By default, the logs from all item processor servers that are registered with the configuration component are aggregated.

Note

An identifier can be processed on different item processor servers, and logs from previous runs can still be present in the system. To show the most recent log entry for each identifier, use the -a command option.

Note

To use a command-line tool, verify that you meet the following minimum requirements: You are a member of the FASTSearchAdministrators local group on the computer where FAST Search Server 2010 for SharePoint is installed.

Syntax

<FASTSearchFolder>\bin\postprocess [options]

Parameters

Parameter Description

<FASTSearchFolder>

The path of the folder where you have installed FAST Search Server 2010 for SharePoint, for example C:\FASTSearch.

No option is required; all are optional.

postprocess general options

Option Value Description

-h

Displays help information.

-v

Displays version information.

-l

<log level>

Specifies the kind of information to log:

  • debug

  • verbose

  • info

  • warning

  • error

-I

<node identifier>

Specifies the node that postprocess is working with.

Do not specify unless you have deleted the node_id.dat file and you are sure that you are passing in the correct identifier.

-d

<path>

Specifies the data storage directory.

Use this option to store crawl data, runtime configuration, and logs in subdirectories within the specified directory.

Default: <FASTSearchFolder>\data\crawler

-R

<crawl_collections>

Refeeds collections. Refeeds all items to content indexing even if items were previously added.

Specify <crawl_collections> as a single collection or a comma-separated list of collections (without any white spaces).

Specify '*' to refeed all collections. Be sure to use quotation marks around the asterisk.

-P

[<address>:]<port number>

Specifies the postprocess port.

Use <port number> to specify the beginning of a port number range used by postprocess (default: crawler baseport + 6).

Optionally, specify an IP address (by host name or value).

Default: 13000

-U

<config file>

Uses the crawler global default configuration file.

This option first tries to find CrawlerGlobalDefaults.xml in the current directory. If not found, it looks in <FASTSearchFolder>\etc\.

Conflicting options that are specified on the command line override values in the configuration file.

-D

Enables direct I/O.

Enable only if supported by the OS.

Default: off

postprocess refeed (-R) mode options

Option Value Description

-r

<crawl_site>

Resumes refeeding after the specified crawl site. Do not use this option with -s.

Note

Use the special keyword @auto for <crawl_site> to have postprocess auto-resume the crawl where your last refeed left off.

-s

<crawl_site>

Processes only the specified site name (hostname).

Do not use this option with -r.

-i

<file>

Processes URIs and sites listed in the specified file.

The -r, -s and -i options are mutually exclusive.

<file> is a newline-separated file that contains URIs, crawl sites, or both.

-x

Processes all permitted URIs. Includes all URIs matching the current collection include/exclude rules; ignores URIs that do not match.

Use with the -u option to specify an updated collection specification XML file.

-X

Deletes excluded URIs.Deletes URIs that do not match the collection specification include/exclude rules.

All other URIs are ignored, unless combined with -x (to process all permitted URIs).

This option is useful with -u.

-b

Applies robots.txt rules when the program is checking for included/excluded items. Robots.txt rules are also applied to the -x and -X options.

-u

<XML config file>

Updates the include/exclude regular expressions loaded from the configuration database with rules from the < XML config file>. This update is only in effect during the postprocess refeed.

-b

Resumes feeding content from the content indexing queues.

If the crawler is processing faster than content indexing, postprocess queues items to disk, and works off those queues. This option processess (empties) any existing queues.

-k

<content_collection>|<feeding_target>

Overrides the content collection specified as the crawl's feed destination.

Or, you can specify the symbolic name of a feeding target (as defined in the collection configuration).

Default: <content_collection>

postprocess internal options

Use these options for testing and debugging only, or internally when postprocess is started by the crawler.

Option Value Description

-p

<port>

Connects to the node scheduler process at <port>.

-S

<port>

Listens at <port>. Used internally only.

-o

Enables FAST Search output mode.

-F

<fileserver port>

Tells postprocess to use an external file server at the given port.

-T

Enables profiling.

Use the profile_methods.py tool to review the profile.

-t

Enables profiling using the hotshot module.

Use the profile_lines.py tool to review the profile.

-n

Uses a null feeder (\dev\null equivalent) (items will not be fed).

Use for testing.

Examples

This example refeeds the crawl collection MyCollection to content-indexing:

<FASTSearchFolder>\bin\postprocess -R MyCollection

The following example refeeds all crawl collections to content indexing:

<FASTSearchFolder>\bin\postprocess -R '*'

The following example refeeds the crawl site www.contoso.com in the crawl collection MyCollection to content indexing:

<FASTSearchFolder>\bin\postprocess -R MyCollection -s www.contoso.com

To refeed the collection MyCollection to content indexing while removing items that no longer match the crawl collection include/exclude rules, follow this example:

<FASTSearchFolder>\bin\postprocess -R MyCollection -x -X

This example updates the crawl collection MyCollection with custom include/exclude rules and refeeds items matching the rules:

<FASTSearchFolder>\bin\postprocess -R MyCollection -x -u MyCustomConfig.xml

MyCustomConfig.xml typically contains finer granularity of includes or excludes for items that you want to reprocess than an option such as -s provides.

Remarks

The crawler uses postprocess to perform duplicate detection and item submission to content indexing. Like the site manager processes, postprocess starts automatically with the crawler. You can also run postprocess on its own when the crawler is not running, to manually refeed items in one or more crawl collections.

Postprocess submits new, modified, and deleted items as the crawler encounters them. Before submission, each item is checked against the duplicate database, unless duplicate detection is turned off.