Compilation and Reuse in Regular Expressions

You can optimize the performance of applications that make extensive use of regular expressions by understanding how the regular expression engine compiles expressions and by understanding how regular expressions are cached. This topic discusses both compilation and caching.

Compiled Regular Expressions

By default, the regular expression engine compiles a regular expression to a sequence of internal instructions (these are high-level codes that are different from Microsoft intermediate language, or MSIL). When the engine executes a regular expression, it interprets the internal codes.

If a Regex object is constructed with the RegexOptions.Compiled option, it compiles the regular expression to explicit MSIL code instead of high-level regular expression internal instructions. This allows .NET's just-in-time (JIT) compiler to convert the expression to native machine code for higher performance. The cost of constructing the Regex object may be higher, but the cost of performing matches with it is likely to be much smaller.

An alternative is to use precompiled regular expressions. You can compile all of your expressions into a reusable DLL by using the CompileToAssembly method. This avoids the need to compile at run time while still benefiting from the speed of compiled regular expressions.

The Regular Expressions Cache

To improve performance, the regular expression engine maintains an application-wide cache of compiled regular expressions. The cache stores regular expression patterns that are used only in static method calls. (Regular expression patterns supplied to instance methods are not cached.) This avoids the need to reparse an expression into high-level byte code each time it is used.

The maximum number of cached regular expressions is determined by the value of the static (Shared in Visual Basic) Regex.CacheSize property. By default, the regular expression engine caches up to 15 compiled regular expressions. If the number of compiled regular expressions exceeds the cache size, the least recently used regular expression is discarded and the new regular expression is cached.

Your application can reuse regular expressions in one of the following two ways:

  • By using a static method of the Regex object to define the regular expression. If you're using a regular expression pattern that has already been defined by another static method call, the regular expression engine will try to retrieve it from the cache. If it's not available in the cache, the engine will compile the regular expression and add it to the cache.

  • By reusing an existing Regex object as long as its regular expression pattern is needed.

Because of the overhead of object instantiation and regular expression compilation, creating and rapidly destroying numerous Regex objects is a very expensive process. For applications that use a large number of different regular expressions, you can optimize performance by using calls to static Regex methods and possibly by increasing the size of the regular expression cache.

See also