Manage crawl rules (FAST Search database connector)

 

Applies to: FAST Search Server 2010

Specify which content should be indexed

The FAST Search database connector crawls content by following SQL select statements. These SQL select statements can be specified either in the JDBCSQL parameter or in a file that you point to by specifying the JDBCSQLFile parameter. Specify one of these parameters in the Input group of the connector configuration file. All selected columns will be indexed by default.

To manage which content should be indexed, select specific columns in the select statements. For example:

SELECT Product.rowguid ID, Product.name Name, Product.weight weight
FROM Product

Specify which content should be transformed

More advanced result sets may require more advanced result set handling. The following parameters from the configuration file can be used to indicate what content should be transformed, and how.

Filter category Parameter Short description

Transformation

UseMultiRowNormalizer

If set to true, values for columns of records with identical identifiers are merged.

In order for this feature to work, the SELECT statement must have an ORDER BY clause on the primary key/identifier field.

Transformation

JDBCMultiValueIncludeFields

Semicolon (;) separated list of column names that should be concatenated in the output. This field overrides any settings to JDBCMultiValueSkipFields..

Do not use this setting together with JDBCAttributeColumn and JDBCValueColumn

Transformation

JDBCMultiValueSkipFields

Semicolon (;) separated list of column names that have the same values in each row, and that should not be repeated in the output.

Do not use this setting together with JDBCMultiValueIncludeFields.

Transformation

JDBCAttributeColumn

For normalized tables, specifies the column that contains attribute names.

Do not use this setting together with JDBCMultiValueIncludeFields and JDBCMultiValueSkipFields.

Transformation

JDBCValueColumn

For normalized tables, specifies the column that contains attribute values.

Do not use this setting together with JDBCMultiValueIncludeFields and JDBCMultiValueSkipFields.

How to use the multi row normalizer

When joining tables, item IDs may be duplicated over several rows. This leads to several rows containing the same identifier. Use the multi row normalizer to collapse these rows into one item.

When you set the parameter UseMultiRowNormalizer to true, text from multiple rows with the same JDBCPrimaryKeyField value are concatenated. If a column contains a non-value, like NULL for example, this sometimes causes the string “NULL” to appear in the result set. To prevent this, use the isnull function when you use SQL Server (or a similar function for a different database type) in the parameter JDBCSQL. For example: SELECT ID, intro, body, isnull( comments, ‘’ ) comments FROM cms.articles;

How to use the parameters JDBCAttributeColumn and JDBCValueColumn

Items extracted from a database may have the desired item attribute names in one column, and the column values in another column. Use these two parameters to specify which column contains the attribute names, and which column contains the attribute value.

How to connect to the source database

Configuring the FAST Search database connector to connect to the source database requires a JDBC connection string to be created for the source database. Different databases use different connection strings. Specify the connection string to the source database in the parameter JDBCURL.

Examples of JDBC connection strings for common databases:

Microsoft Microsoft SQL Server 2000:

jdbc:microsoft:sqlserver://<YourServerHere>:1433;DatabaseName=<YourDatabaseNameHere>

Where:

  • <YourServerHere> is the SQL Server host name

  • <YourDatabaseNameHere> is the name of the source database that you are connecting to

Microsoft Microsoft SQL Server 2005 and later versions

jdbc:sqlserver://<YourServerHere>:1433;database=<YourDatabaseNameHere>

Where:

  • <YourServerHere> is the SQL Server host name

  • <YourDatabaseNameHere> is the name of the source database that you are connecting to

IBM DB2

jdbc:db2://<YourServerHere>:50000/<YourDatabaseNameHere>

Where:

  • <YourServerHere> is the IBM DB2 server host name

  • <YourDatabaseNameHere> is the name of the source database that you are connecting to

Oracle

jdbc:oracle:thin:@<YourServerHere>:1521:<YourDataBaseNameHere>

Where:

  • <YourServerHere> is the Oracle server host name

  • <YourDatabaseNameHere> is the name of the source database that you are connecting to

Connecting to databases from providers other than Microsoft also requires the JDBC driver from the specific provider to be put in the /lib directory of your FAST Search Server 2010 for SharePoint installation folder. Refer to the database provider to find out how to obtain the JDBC driver.

See Also

Concepts

jdbctemplate.xml reference