Catching common C# string performance fixes with Semgrep

Illustration showing the app performance decreasing because of a pileup of many small issues

Following up on my previous post, I’ve put together a new set of Semgrep rules focused specifically on string-related performance issues in C#.

These are the kinds of things that rarely show up in code reviews, and not everything is covered by Resharper or IDEs. The issues adds up tho, especially if they are in a hot path. Things related to strings can also cause a lot of allocations, leading to plenty of work for the garbage collector. These rules are designed to be lightweight, easy to integrate into your workflow, and catch the kind of subtle inefficiencies that can quietly degrade performance over time. Some of the have auto fixes, meaning you can apply the rules to your code base, and it will sort it out. This is still work in progress, but let’s go through the rules.

1. String Comparison

Calling ToLower() or ToUpper() just to compare strings is wasteful, it allocates a new string, converts every character, and then compares. Use string.Equals(str1, str2, StringComparison.OrdinalIgnoreCase) to compare the strings without creating any temporary strings. Resharper does not flag this.

 public bool ToLower_Different()
 {
     // Here ToLower allocates a new string.
     return TestString1.ToLower().Equals(TestString2);
 }
 public bool StringEquals_OrdinalIgnoreCase_SameIgnoreCase()
{
      // Here we compare without allocating new strings
      return string.Equals(TestString1, TestString2,       
                           StringComparison.OrdinalIgnoreCase);
}

csharp-inefficient-string-comparison.yaml

rules:
  - id: csharp-inefficient-string-comparison
    patterns:
      - pattern-either:
          - pattern: $STR.ToLower().Equals($OTHER)
          - pattern: $STR.ToLowerInvariant().Equals($OTHER)
          - pattern: $STR.ToUpper().Equals($OTHER)
          - pattern: $STR.ToUpperInvariant().Equals($OTHER)
      - pattern-not: String.Equals($STR, $OTHER, StringComparison.OrdinalIgnoreCase)
    message: >
      Inefficient string comparison. Use String.Equals(s1, s2, StringComparison.OrdinalIgnoreCase) 
      instead of ToLower()/ToUpper().Equals() for better performance and clarity.
    fix: String.Equals($STR, $OTHER, StringComparison.OrdinalIgnoreCase)
    languages: [csharp]
    severity: WARNING
    metadata:
      category: performance
      subcategory:
      - easyfix
      - strings
      references:
      - "https://blog.smistad.me/semgrep-rules-for-c-performance/"

2. Avoid string.Format for cases where interpolation is enough

string.Format adds overhead and is harder to read. Interpolation ($"...") is faster and alloc-free in simple cases. For more complex formatting you should continue to use string.Format, but where it is used for basic string concatenation you should switch to string interpolation. Resdharper suggests fixing this if you use string.Format, but not in cases where you use string.Concat.

public string Format()
{
    // Resharper suggests switching to interpolation
    return string.Format("{0} {1} {2}", Left, Right, Middle);
}

public string Interpolation()
{
    return $"{Left} {Right} {Middle}";
}

public string Concat()
{
    // No suggestion to fix this from Resharper
    return string.Concat(Left, " ", Right, " ", Middle);
}
MethodMeanRationAllocated
Interpolation0.4472 ns1.00
Concat19.2632 ns43.1656 B
Format44.2375 ns99.1256 B

Interpolation is much faster than the alternatives. The benchmark here is the code shown above, so I guess the interpolation just gets optimized away in the end. This will also apply to your actual code in situations where you use it for simple string concatenations. It also causes less allocations than the alternatives.

The reason is that we have to avoid parsing the format parameters, in the cases where you just refer to a variable. So this improvement only really works in simple use cases.

csharp-string-format-to-interpolation.yaml

The regex for detecting more complex format parameters is not quite working. So this rule currently picks up some false-positives.

rules:
  - id: csharp-string-format-to-interpolation
    languages: [csharp]
    severity: WARNING
    message: "Use string interpolation ($\"...\") instead of string.Format for simple cases"
    metadata:
      description: "Detects simple string.Format calls that could be replaced with string interpolation"
      category: "performance"
      references:
        - "https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated"
        - "https://blog.smistad.me/semgrep-rules-for-c-performance/"
      technology:
        - csharp
      subcategory:
        - easyfix
        - strings
    pattern-either:
      - pattern: string.Format("$FMT", $A1)
      - pattern: string.Format("$FMT", $A1, $A2)
      - pattern: string.Format("$FMT", $A1, $A2, $A3)
      - pattern: string.Format("$FMT", $A1, $A2, $A3, $A4)
    pattern-not-regex: \{\d+:[^}]+\}

3. Use AsSpan() Instead of Substring()

In some cases we can avoid allocating a new string with string.Substring() and instead use .AsSpan().

Some typical cases we can avoid is inputs to int/double/Guid.Parse() methods, or comparing a substring to a string literal.

// allocates new string
int.Parse(tName.Substring("VariantArray".Length),;
if (s.Substring(i) == "INF")

// Using AsSpan()
int.Parse(tName.AsSpan("VariantArray".Length));
if (s.AsSpan(i).SequenceEqual("INF"))

Here Resharp will suggest to use a range index instead of substring, but this is actually slower than using the substring method. You do not avoid any allocations either, as you do if you use .AsSpan().

[Benchmark(Baseline = true)]
public string Substring()
{
    return "this is my wonderful string".Substring("this".Length);
}

[Benchmark]
public ReadOnlySpan<char> AsSpan()
{
    return "this is my wonderful string".AsSpan("this".Length);
}

[Benchmark]
public string RangeIndex()
{
    // Resharper will sugest changing your code to this
    return "this is my wonderful string"["this".Length..];
}
MethodMeanRatioAllocated
AsSpan0.2085 ns0.05
Substring4.5797 ns1.0072 B
RangeIndex6.5919 ns1.4472 B

csharp-substring.yaml

rules:
  - id: csharp-avoid-substring-for-span-accepting-methods
    languages: [csharp]
    message: Use AsSpan instead of Substring to avoid string allocations when passing to methods accepting ReadOnlySpan<char>.
    severity: WARNING
    metadata:
      category: performance
      subcategory:
        - easyfix
        - strings
      likelihood: LOW
      impact: LOW
    patterns:
      - pattern: $METHOD($STR.Substring($IDX))
      - metavariable-regex:
          metavariable: $METHOD
          regex: >
            (int|float|double|decimal|uint|long|bool|Guid|DateTime|DateTimeOffset)\.(Parse(Exact)?|TryParse(Exact)?)
    fix: $METHOD($STR.AsSpan($IDX))
  - id: csharp-avoid-substring-for-suffix
    pattern: $STR.Substring($IDX)
    message: Use AsSpan instead of Substring to avoid string allocations.
    languages: [csharp]
    severity: INFO
    metadata:
      category: performance
      subcategory:
      - easyfix
      - strings
      likelihood: LOW
      impact: LOW
    fix: $STR.AsSpan($IDX)

  - id: csharp-avoid-substring-equals
    pattern: $STR.Substring($IDX) == "$SUFFIX"
    message: Use AsSpan(...).SequenceEqual("...") instead of Substring == "..." for performance.
    languages: [csharp]
    severity: WARNING
    metadata:
      category: performance
      subcategory:
      - easyfix
      - strings
      likelihood: LOW
      impact: LOW
    fix: $STR.AsSpan($IDX).SequenceEqual("$SUFFIX")

4. Optimize UTF-8 Transcoding

Not avoiding any allocations with this one, but you save some CPU cycles. This is also caught by Resharper

return Encoding.UTF8.GetBytes("ThIs A StRiNG");
// Can be shortend to this:
return "ThIs A StRiNG"u8.ToArray();

csharp-avoid-transcoding.yaml

rules:
- id: csharp-avoid-transcoding
  patterns:
  - pattern-either:
    - pattern: Encoding.UTF8.GetBytes("$STR")
  message: Use u8 to avoid csharp-avoid-transcoding
  fix: \"$STR\"u8.ToArray()
  languages: [csharp]
  severity: WARNING
  metadata:
    category: performance
    subcategory:
    - easyfix
    - strings
    likelihood: LOW
    impact: LOW

Want to see all the rules and benchmarks? Check out csharp-semgrep-performance on GitHub.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *