Performance comparison of Regex in .NET

April 12th 2024 C# .NET BenchmarkDotNet

If you've used regular expressions a lot in .NET, you most likely already know that by using the Compiled option you can improve the run time performance a lot at the cost of higher initialization time. What you might not know is that since .NET 7 you can use a source generator instead to avoid that initialization cost.

The simplest way to use a regular expression for validating an input is by calling a static method on the Regex class:

Regex.IsMatch(input, pattern);

Since regular expressions are parsed and transformed into an optimized tree format that can be more efficiently interpreted, one would think that creating and reusing a static instance instead would benefit performance:

private static readonly Regex regex = new(pattern);

regex.IsMatch(input);

However, the regular expression engine caches the transformed regular expressions when static methods are used, which significantly reduces the performance difference between the two approaches.

To achieve even better performance, you need to use compiled regular expressions:

private static readonly Regex compiledRegex = new(pattern, RegexOptions.Compiled);

compiledRegex.IsMatch(input);

This instructs the regular expression engine to replace the tree representations with their compiled IL (intermediate language) counterparts. This significantly improves the run time performance of regular expression matching, but requires longer initialization time before the first use to actually emit the IL instructions.

Since .NET 7, there is an even better alternative available: a source generator that generates the code at compile time instead of run time. There's a preconfigured code analyzer to inform you about this new feature:

Code analyzer informing about Regex source generator

SYSLIB1045: Use GeneratedRegexAttribute to generate the regular expression implementation at compile-time.

You can learn more about regular expression source generator here. It's pretty easy to switch to using it:

  • make your class partial,
  • change your Regex field to a partial method and add the GeneratedRegex attribute to it:

    [GeneratedRegex(pattern)]
    private static partial Regex SourceGeneratedRegex();
    
    SourceGeneratedRegex().IsMatch(input);
    

In addition to avoiding the initialization cost of compiled regular expressions before the first use, this also slightly improves their performance on subsequent uses.

Here's a performance comparison of all 4 approaches I described, using BenchmarkDotNet (the initialization time before the first call is not measured):

Method Mean Error StdDev
ValidateUsingSingleUseRegex 71.96 ns 0.301 ns 0.267 ns
ValidateUsingStaticRegex 68.44 ns 0.397 ns 0.331 ns
ValidateUsingStaticCompiledRegex 24.49 ns 0.070 ns 0.066 ns
ValidateUsingSourceGeneratedRegex 20.34 ns 0.048 ns 0.042 ns

If you want to run the benchmark yourself, you can find the full source code in my GitHub repository.

Regular expression source generator is a great alternative to compiled regular expressions. If the regular expressions you're using are known at compile time, I can't think of a reason for not switching to it. You get even better performance without any extra initialization time before the first use.

Get notified when a new blog post is published (usually every Friday):

If you're looking for online one-on-one mentorship on a related topic, you can find me on Codementor.
If you need a team of experienced software engineers to help you with a project, contact us at Razum.
Copyright
Creative Commons License