Export (0) Print
Expand All
1 out of 1 rated this helpful - Rate this topic

Understanding Policy Rule Syntax

 

Applies to: Office 365 for enterprises, Live@edu, Forefront Online Protection for Exchange

Topic Last Modified: 2012-05-02

When creating policy rules in Forefront Online Protection for Exchange (FOPE), you have two options for the syntax of the rule. First, you can use comma-separated values (CSV) mixed with the string-wildcard syntax, which is listed as Basic in the FOPE Administration Center Policy Rules editor. Second, you can use a subset of characters specified in the Regular Expression syntax, which is listed as RegEx in the FOPE Administration Center Policy Rules editor, for more complex rules.

If you are not familiar with RegEx syntax, you can select the Basic option, which will allow you to create simple expressions for content filtering and help you write correct matching expressions format by evaluating your match expressions based on the context they are defined within. For example, only correct IP addresses will be accepted in the IP address field. If the rules you wish to create require more complexity, such as matching patterns of numbers and patterns of words, you can use a subset of RegEx syntax characters by selecting the RegEx option. With this option, context-based validations are not available; only syntax-specific validations will apply.

When you select Basic in the Policy Rules editor, you will be able to list expressions using a CSV syntax separating the match expressions with commas. Additionally, you will be able to enhance match expressions with simple string-wildcard metacharacters. The same syntax applies to dictionary files uploaded through Filters.

 

Metacharacter Description Example

,

A comma is the choice metacharacter, also known as an alternation or a separator, and it matches the expression listed either before or after the comma.

abc, def, xyz matches abc or def or xyz

*

An asterisk is the wildcard metacharacter and it represents zero or more characters.

noteNote:
This character is equivalent to the (.*) in the RegEx syntax.

ab* matches aba or abaa or abaaa or ab12345667, etc.

?

The question mark represents any single character.

noteNote:
This character is equivalent to the (.) in the RegEx syntax.

ab? matches aba or abb or abc or ab1 or ab2, etc.

\

A backslash is the escape operator. In order to match the literals (,) (*) (?) or (\) and to suppress their special meaning in the basic syntax, the escape operator needs to be placed in front of the basic syntax operators.

noteNote:
This character is equivalent to the (\) in the RegEx syntax.

\*a\\bc\? matches *a\bc?

/0 up to /32

A slash mark indicates Classless Inter-Domain Routing (CIDR) notation. This can be expressed by adding a slash mark (/) followed by a number from 0 up to 32 after the last octet of an IP address.

noteNote:
CIDR notation applies only to IP address expressions and cannot be used in other contexts.

99.99.98.0/23 matches IP ranges from 99.99.98.0 up to 99.99.99.255

noteNote:
The total number of characters inserted into any Policy Rule field or dictionary cannot exceed 9,000. Dictionary file size limit is 2 MB.

When you select RegEx in the Policy Rules editor, you can specify more complex expressions that match patterns of text, numbers, or special characters. For example, you can match many different variations of a word such as viagra, vi@gra, vlagra using a subset of RegEx characters. This will allow you to minimize the number of rules needed and to create powerful matching expressions, such as scanning for credit card numbers, social security numbers, email addresses, and similar strings of sensitive words or numbers.

The RegEx option in the Policy Rules editor is a subset of the POSIX Basic and Extended Regular Expressions syntax, expressed in the following table.

 

Character type Character Description Example

Meta

^

The caret metacharacter matches the starting position within the string.

noteNote:
Used in combination with the dollar sign character, the caret has the same functionality as the exact match option.

^abc matches abc1234, but will not match 1234abc

Meta

$

The dollar sign metacharacter matches the ending position of the string, or the position just before a string-ending newline.

noteNote:
In combination with the caret character, the dollar sign offers the same functionality as the exact match option.

abc$ matches 1234abc, but will not match abc1234

Meta

*

The star matches the preceding element zero or more times.

importantImportant:
This character should be used with caution; match expressions using this character might match more than intended.

ab*x matches abx or abbx or abbbx or abbbbx, etc.

Meta

+

The plus metacharacter matches the preceding element one or more times.

This character should be used with caution; match expressions using this character might match more than intended.

ab+x matches abbx or abbbx or abbbbx or abbbbbx, etc.

Meta

.

The period metacharacter matches any single character except new line.

ab.x matches ab1x or ab2x or ab3x or ab4x, etc.

Meta

?

The question mark matches the preceding element zero or one times.

ab? matches a or ab

Meta

|

The pipe is a choice, or alternation, character, which matches the expression either before or after the operator, starting from the first (left) string and stopping when a match is found.

abc|def|xyz matches abc or def or xyz or abc12345, but will not match a123c or axm

Meta

\

The backslash causes RegEx metacharacters to be treated as literal characters in the context of the rule.

x\*1\.5\+9\\x=y matches x*1.5+9\x=y

Class

\w

The backslash with lowercase w matches any word character, including alphanumeric characters with "_".

\w123 matches a123 or bbb123 or c_c123xxx, but will not match @123

Class

\d

The backslash with lowercase d matches any decimal digit.

\dabc matches 123abc or 12345abcxxx or 1abc1, but will not match abc123 or @abc123

Class

\s

The backslash with lowercase s matches any white-space character.

abc\sdef matches abc def

importantImportant:
If you are unfamiliar with RegEx syntax, we recommend that you use the Basic option or test rules by using the Test policy rule action before using them with policy rule actions such as Reject, Encrypt, Redirect. The Administration Center Policy Rules support only a subset of RegEx characters.
The total number of RegEx characters inserted into any Policy Rule field cannot exceed 9,000.

The following are examples of RegEx expressions matching different parts of a message:

  • A period used to match file extensions will match any single character after your expression. For example, r. would match any file name that begins with the letter r and any single character following it. For example, the match expression r. would match extensions such as r1 or another two-character combination.
  • In order to be matched as literal, all RegEx metacharacters need to be accompanied by the escape operator. Non-RegEx metacharacters are matched literally and do not need to have the escape operator added. If you want to match the period in a domain name, this period needs to have the escape character added by listing \.. The match expression contoso\.com will match contoso.com.
  • For Domain options, the domain matching acts on the presence of the sender or recipient domain in an e-mail header. For example, a rule to take action on contoso.com will also affect messages for the sub domain 123.contoso.com. If you want to match only the domain name without any subdomains, then you can configure the rule by using the carat metacharacter ^contoso.com. The rule configured in this manner would match only e-mails sent to or received from contoso.com and not 123.contoso.com.
  • If you want to search for terms, for example, in the subject, body, or attachment file name of an e-mail that ended with the string “ness”, combine the asterisk and the period to perform your match. For example, the search term .*ness will return results such as “wilderness” or “happiness”.
  • A period . followed by an asterisk * represent zero or more characters. For example, the match expression contoso\..* would match “contoso.com” but would also match “contoso.microsoft.com” or “contoso.mydomain.ca”, etc. In order to match only the Top Level Domain (TLD) of a domain, a more precise match expression is required. contoso\.\w\w\w$ will match “contoso.com” or “contoso.org” or “contoso.tv1”, but will not match any domains with more or less than 3 alphanumeric characters after the period.

Regular expressions, abbreviated as RegEx, are a standard formal language used in many systems and programming languages. Regular expressions can be powerful if used in a proper way. You can find more information, including syntax definition, examples, and tutorials, on many Web sites, such as the following:

 
Did you find this helpful?
(1500 characters remaining)
Thank you for your feedback

Community Additions

ADD
Show:
© 2014 Microsoft. All rights reserved.