SharePoint keyword filtering

 

Applies to: Forefront Security for SharePoint

Keyword filtering analyzes the contents of documents, text, Excel, Word, Office 2007 Open-XML, HTML, and PowerPoint files to identify unwanted or prohibited content. By creating keyword filter lists, you can filter documents based on a variety of words, phrases, and sentences.

Creating new keyword lists

For maximum flexibility, you can create your own lists of keywords to scan for. You can thus maintain individual lists of filters for use by different scan jobs.

To create a new keyword list

  1. In the FILTERING section of the Shuttle Navigator, click the Filter Lists icon.

  2. In the List Types pane, select Keywords.

  3. In the List Names section, click the Add button.

  4. Type a name for the new list, and then press ENTER. The empty list appears in the List Names section.

  5. With the new list name selected, click the Edit button. The Edit Filter List dialog box appears. Use it to add content to your filter list.

  6. In the Include In Filter section, click the Add button.

  7. Type a word or phrase to be included in the filter list. Press Enter when you are finished typing. You may have as many words or phrases as you want, but each must be entered separately.

    The Exclude From Filter section is used to enter keywords or phrases that should never be included on the Keyword list. This prevents those words and phrases from accidentally being added when importing a list from a text file. For more information on importing files, see Importing items into a filter list.

  8. When you are finished adding items, click OK. The list of words you just entered appears, alphabetically, in the pane next to List Names.

  9. Click Save.

Configuring keyword lists

After you have created a keyword list, you must configure it.

To configure a keyword list

  1. In the Shuttle Navigator, click FILTERING.

  2. Click the Keyword icon. The Keyword Filtering work pane appears.

  3. In the top pane, select the scan job for which you would like to enable an existing keyword filter.

  4. In the Keyword Fields section, select Text/HTML/Word/PowerPoint Documents.

  5. In the Filter Lists section, select one of the filter lists you have created.

  6. Using the Filter field, set the filter to Enabled.

  7. Set the Action. For more information, see Keyword filter actions.

  8. Indicate if you would like to Send Notifications. Notifications are disabled by default.

  9. Indicate if you would like to Quarantine identified files. Enabling quarantine causes deleted files to be stored, permitting you to recover them. Quarantining is enabled by default.

  10. Indicate the Minimum Unique Keyword Hits. This setting enables you to specify how many unique keywords must be matched for the action to be taken. The default is one (1). For example, you have set the minimum unique keyword hits value to 3. The word "wonderful", which is in the list, appears three times in the document. However, no other word in the list appears at all. The keyword filter has not been matched, because only one term in the list was matched.

  11. Click Save.

Filters for racial discrimination, sexual discrimination, spam, and any other custom lists must be created individually. For profanity filters, see Example lists.

Keyword filter actions

You must indicate the action that Forefront Security for SharePoint should take upon detecting a match to your filter criteria.

Note

You must set the action for each keyword filter you configure. The action setting is not global.

Skip: Detect Only

Records the number of messages that meet the filter criteria, but enables messages to route normally. If, however, Delete Corrupted Compressed, Delete Corrupted Uuencode Files, or Delete Encrypted Compressed Files was selected in General Options, a match to any of those conditions will cause the item to be deleted.

Block: prevent transfer

Prevents the transfer of a file that meets the filter criteria. This action is for Realtime scans only.

Delete: remove infection

Deletes the contents of the file and replaces it with the Deletion Text. This action is for Manual scans only.

Keyword list syntax rules

The following are the syntax rules for a keyword filter list. Be careful to use the appropriate syntax because FSSP does not perform validation. If the filtering results are not what you are expecting, it is recommended that you double-check your syntax.

  • Each item (line of text) is considered a search query.
  • Queries use the OR operator. It is considered to be a positive detection if any entry is a match.
  • Queries are comprised of operands (keywords), which are text tokens or a string of text tokens, such as:
    • apple (means that the text contains “apple”)
    • apple juice (means that the text contains “apple juice”)
    • get rich quick (means that the text contains “get rich quick”)
  • Queries may also contain operators that precede or separate operands in an expression.
  • An expression may be comprised of a single operand, an operand preceded by the _NOT_ or _HAS[#]OF_ operators, or two operands joined by the _AND_, _ANDNOT_, or _WITHIN[#]OF_ operators.
    The following logical operators are supported in expressions. There must be a space between an operator and an operand (or another operator), represented in the examples by the • character:
    • _AND_ (logical AND). For example, apples•_AND_•oranges. A filter such as this would be matched if the text contains both “apples” and “oranges”.
    • _NOT_ (negation). For example, _NOT_•oranges. A filter such as this would be matched if the text does not contain “oranges”.
    • _ANDNOT_ (logical AND negation). For example, apples•_ANDNOT_•oranges. A filter such as this would be matched if the text contains “apples” but does not contain “oranges”. _ANDNOT_ is functionally equivalent to _AND_•_NOT_.
    • _HAS[#]OF_ (frequency). Specifies the minimum number of times that the text must appear in order for the query to be considered true. For example, _HAS[4]OF_•get rich quick. If the phrase "get rich quick" is found in the text four or more times, this query is true. This operator implicitly has a default value of 1 when it is not specified.
    • _WITHIN[#]OF_ (proximity). If the two terms are within a specified number of words before or after each other, there is a match. For example, free•_WITHIN[10]OF_•offer. If "free" appears within 10 words before or after "offer", this query is true.
      Multiple operators are permitted in a single query. The precedence of the operators is (from highest to lowest):
    • _WITHIN[#]OF_
    • _HAS[#]OF_
    • _NOT_, _AND_, and _ANDNOT_ (these are at the same precedence level because they are used in conjunction when part of an expression)
      This precedence cannot be overridden with parentheses. Other considerations are:
    • The logical operators must be entered in uppercase letters.
    • Phrases may be used as keywords. For example, apple juice or get rich quick. Quotation marks are not used.
    • Multiple blank spaces (blank characters, line feed characters, carriage return characters, horizontal tabs, and vertical tabs) are treated as one blank space for matching purposes. For example, A••••B is treated as A•B and matches the phrase A•B.
    • In HTML-encoded message texts, punctuation (any non-alphanumeric character) is treated as a word separator similar to blank spaces. Therefore, words surrounded by HTML tags can be properly identified by the filter. However, note that the filter '<html>' will match '<html>', but not 'html'.

Examples (the • character represents a space):

  • apples•_AND_•oranges•_AND_•lemons•_WITHIN[50]OF_•juice
    This expression means that “apples”, “oranges”, and “lemons” all appear at least once, and that “lemons” is within 50 words of “juice”.
  • confidential•_WITHIN[10]OF_•project•_AND_•banana•_WITHIN[25]OF_•shake
    This expression means that “confidential” is within 10 words of “project”, and that “banana” is within 25 words of “shake”.
  • _HAS[2]OF_•get rich•_WITHIN[20]OF_•quick
    This expression means that “get rich” appears at least 2 times within 20 words of “quick”.

Viewing keyword list contents

You can view the contents of any selected filter list by clicking the Lists button (in the Filter Lists section of the Keyword Filtering work pane), selecting an item, and then clicking View List. Click the Back button (the left-pointing arrow) when you are finished viewing the contents.

Case sensitive filtering

The General Option Case Sensitive Keyword Filtering setting causes Forefront Security for SharePoint to use case-sensitive comparisons for all keyword filters. By default, comparisons are not case-sensitive. For more information, see "General Options" in SharePoint Forefront Server Security Administrator.

Example lists

To aid you in filtering for profanity, example lists in various languages are included with the product. This is an optional component of FSSP and must be installed separately.

If you want to install one or more of these lists, follow these steps.

To install the example lists

  1. Find the file called KeywordInstaller.msi in the installation folder and double-click it.

    Note

    The .msi file is not present on any computer which has had an Administrator-only installation or on one that does not contain a Forefront Security product.

  2. You must read and consent to the license agreement/disclaimer.

  3. You are presented with a list of available files. You may select any number of the various language files. The files you select are placed into a folder called Example Keywords in the database directory (which, by default is c:\Program Files(x86)\Microsoft Forefront Security\SharePoint\Data).

  4. After the files have been extracted, you must import them into your filters. For more information on importing files, see Importing items into a filter list.

Note

It is your responsibility to visually inspect all of the selected files to determine if there are words that are completely harmless in your environment, especially if you are using multiple language files. You must review the imported list and decide if you are going to eliminate any word clashes. If a certain word is unacceptable in one language but harmless in another, you must determine what is more important to you: catching everything (the default, if you accept all the words in all the selected lists) at the risk of false positives or risk not detecting something by deleting words from the list (which avoids those false positives).

Importing items into a filter list

Data for filter lists may be created offline in Notepad or a similar text editor and then imported into the appropriate filter list using the Forefront Server Security Administrator. Note that Forefront Security for SharePoint can only import lists that are UTF-16 or ANSI files. Other Unicode types will not be properly imported.

To create and import entries into a filter list

  1. Create a list and save it as a text file. Place each filter on its own line in the file.

  2. In the FILTERING section of the Shuttle Navigator, click Filter Lists.

  3. Select the filter list into which you will be importing data.

  4. Click Edit. The Edit Filter List dialog box appears.

  5. Click the Import button. A File Explorer window opens. Use it to navigate to the text file you created in step 1.

  6. Select the file and click Open.

  7. The file is imported into the middle (New Items) pane of the Import List editor to enable you to select the entries you would like to include in your filter list. Use the <=== button to move all the items into the Include In Filter pane or use the <--- button to move single items. You can use the right-pointing arrows to move items into the Exclude From Import pane.

  8. When you have moved all the desired items, click OK.

  9. Click Save.