Tag: semgrep

  • Catching common C# string performance fixes with Semgrep

    Catching common C# string performance fixes with Semgrep

    Following up on my previous post, I’ve put together a new set of Semgrep rules focused specifically on string-related performance issues in C#.

    These are the kinds of things that rarely show up in code reviews, and not everything is covered by Resharper or IDEs. The issues adds up tho, especially if they are in a hot path. Things related to strings can also cause a lot of allocations, leading to plenty of work for the garbage collector. These rules are designed to be lightweight, easy to integrate into your workflow, and catch the kind of subtle inefficiencies that can quietly degrade performance over time. Some of the have auto fixes, meaning you can apply the rules to your code base, and it will sort it out. This is still work in progress, but let’s go through the rules.

    1. String Comparison

    Calling ToLower() or ToUpper() just to compare strings is wasteful, it allocates a new string, converts every character, and then compares. Use string.Equals(str1, str2, StringComparison.OrdinalIgnoreCase) to compare the strings without creating any temporary strings. Resharper does not flag this.

     public bool ToLower_Different()
     {
         // Here ToLower allocates a new string.
         return TestString1.ToLower().Equals(TestString2);
     }
     public bool StringEquals_OrdinalIgnoreCase_SameIgnoreCase()
    {
          // Here we compare without allocating new strings
          return string.Equals(TestString1, TestString2,       
                               StringComparison.OrdinalIgnoreCase);
    }

    csharp-inefficient-string-comparison.yaml

    rules:
      - id: csharp-inefficient-string-comparison
        patterns:
          - pattern-either:
              - pattern: $STR.ToLower().Equals($OTHER)
              - pattern: $STR.ToLowerInvariant().Equals($OTHER)
              - pattern: $STR.ToUpper().Equals($OTHER)
              - pattern: $STR.ToUpperInvariant().Equals($OTHER)
          - pattern-not: String.Equals($STR, $OTHER, StringComparison.OrdinalIgnoreCase)
        message: >
          Inefficient string comparison. Use String.Equals(s1, s2, StringComparison.OrdinalIgnoreCase) 
          instead of ToLower()/ToUpper().Equals() for better performance and clarity.
        fix: String.Equals($STR, $OTHER, StringComparison.OrdinalIgnoreCase)
        languages: [csharp]
        severity: WARNING
        metadata:
          category: performance
          subcategory:
          - easyfix
          - strings
          references:
          - "https://blog.smistad.me/semgrep-rules-for-c-performance/"

    2. Avoid string.Format for cases where interpolation is enough

    string.Format adds overhead and is harder to read. Interpolation ($"...") is faster and alloc-free in simple cases. For more complex formatting you should continue to use string.Format, but where it is used for basic string concatenation you should switch to string interpolation. Resdharper suggests fixing this if you use string.Format, but not in cases where you use string.Concat.

    public string Format()
    {
        // Resharper suggests switching to interpolation
        return string.Format("{0} {1} {2}", Left, Right, Middle);
    }
    
    public string Interpolation()
    {
        return $"{Left} {Right} {Middle}";
    }
    
    public string Concat()
    {
        // No suggestion to fix this from Resharper
        return string.Concat(Left, " ", Right, " ", Middle);
    }
    MethodMeanRationAllocated
    Interpolation0.4472 ns1.00
    Concat19.2632 ns43.1656 B
    Format44.2375 ns99.1256 B

    Interpolation is much faster than the alternatives. The benchmark here is the code shown above, so I guess the interpolation just gets optimized away in the end. This will also apply to your actual code in situations where you use it for simple string concatenations. It also causes less allocations than the alternatives.

    The reason is that we have to avoid parsing the format parameters, in the cases where you just refer to a variable. So this improvement only really works in simple use cases.

    csharp-string-format-to-interpolation.yaml

    The regex for detecting more complex format parameters is not quite working. So this rule currently picks up some false-positives.

    rules:
      - id: csharp-string-format-to-interpolation
        languages: [csharp]
        severity: WARNING
        message: "Use string interpolation ($\"...\") instead of string.Format for simple cases"
        metadata:
          description: "Detects simple string.Format calls that could be replaced with string interpolation"
          category: "performance"
          references:
            - "https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated"
            - "https://blog.smistad.me/semgrep-rules-for-c-performance/"
          technology:
            - csharp
          subcategory:
            - easyfix
            - strings
        pattern-either:
          - pattern: string.Format("$FMT", $A1)
          - pattern: string.Format("$FMT", $A1, $A2)
          - pattern: string.Format("$FMT", $A1, $A2, $A3)
          - pattern: string.Format("$FMT", $A1, $A2, $A3, $A4)
        pattern-not-regex: \{\d+:[^}]+\}
    

    3. Use AsSpan() Instead of Substring()

    In some cases we can avoid allocating a new string with string.Substring() and instead use .AsSpan().

    Some typical cases we can avoid is inputs to int/double/Guid.Parse() methods, or comparing a substring to a string literal.

    // allocates new string
    int.Parse(tName.Substring("VariantArray".Length),;
    if (s.Substring(i) == "INF")
    
    // Using AsSpan()
    int.Parse(tName.AsSpan("VariantArray".Length));
    if (s.AsSpan(i).SequenceEqual("INF"))

    Here Resharp will suggest to use a range index instead of substring, but this is actually slower than using the substring method. You do not avoid any allocations either, as you do if you use .AsSpan().

    [Benchmark(Baseline = true)]
    public string Substring()
    {
        return "this is my wonderful string".Substring("this".Length);
    }
    
    [Benchmark]
    public ReadOnlySpan<char> AsSpan()
    {
        return "this is my wonderful string".AsSpan("this".Length);
    }
    
    [Benchmark]
    public string RangeIndex()
    {
        // Resharper will sugest changing your code to this
        return "this is my wonderful string"["this".Length..];
    }
    MethodMeanRatioAllocated
    AsSpan0.2085 ns0.05
    Substring4.5797 ns1.0072 B
    RangeIndex6.5919 ns1.4472 B

    csharp-substring.yaml

    rules:
      - id: csharp-avoid-substring-for-span-accepting-methods
        languages: [csharp]
        message: Use AsSpan instead of Substring to avoid string allocations when passing to methods accepting ReadOnlySpan<char>.
        severity: WARNING
        metadata:
          category: performance
          subcategory:
            - easyfix
            - strings
          likelihood: LOW
          impact: LOW
        patterns:
          - pattern: $METHOD($STR.Substring($IDX))
          - metavariable-regex:
              metavariable: $METHOD
              regex: >
                (int|float|double|decimal|uint|long|bool|Guid|DateTime|DateTimeOffset)\.(Parse(Exact)?|TryParse(Exact)?)
        fix: $METHOD($STR.AsSpan($IDX))
      - id: csharp-avoid-substring-for-suffix
        pattern: $STR.Substring($IDX)
        message: Use AsSpan instead of Substring to avoid string allocations.
        languages: [csharp]
        severity: INFO
        metadata:
          category: performance
          subcategory:
          - easyfix
          - strings
          likelihood: LOW
          impact: LOW
        fix: $STR.AsSpan($IDX)
    
      - id: csharp-avoid-substring-equals
        pattern: $STR.Substring($IDX) == "$SUFFIX"
        message: Use AsSpan(...).SequenceEqual("...") instead of Substring == "..." for performance.
        languages: [csharp]
        severity: WARNING
        metadata:
          category: performance
          subcategory:
          - easyfix
          - strings
          likelihood: LOW
          impact: LOW
        fix: $STR.AsSpan($IDX).SequenceEqual("$SUFFIX")

    4. Optimize UTF-8 Transcoding

    Not avoiding any allocations with this one, but you save some CPU cycles. This is also caught by Resharper

    return Encoding.UTF8.GetBytes("ThIs A StRiNG");
    // Can be shortend to this:
    return "ThIs A StRiNG"u8.ToArray();

    csharp-avoid-transcoding.yaml

    rules:
    - id: csharp-avoid-transcoding
      patterns:
      - pattern-either:
        - pattern: Encoding.UTF8.GetBytes("$STR")
      message: Use u8 to avoid csharp-avoid-transcoding
      fix: \"$STR\"u8.ToArray()
      languages: [csharp]
      severity: WARNING
      metadata:
        category: performance
        subcategory:
        - easyfix
        - strings
        likelihood: LOW
        impact: LOW

    Want to see all the rules and benchmarks? Check out csharp-semgrep-performance on GitHub.

  • Semgrep rules for C# performance

    Semgrep rules for C# performance

    Performance isn’t just about making users happy, though. It makes our lives as developers way better too. Think about your typical day, you’re constantly running your code, debugging, testing, and then doing it all over again. When your code runs faster, you spend less time waiting and more time actually coding. Nobody enjoys staring at a spinning wheel while your tests run or the debugger loads up. Those small delays mess with your flow and make development less fun.

    I was thinking of using semgrep to catch a lot of small easy to fix performance improvements. So I want to just share the rules I make, so maybe somebody else can use them to.

    These rules will be covering the small cases, but sometimes the performance issues can be a death of a thousand cuts. Garbage Collection can be a real killer for performance, so a lot of the rules will try to cover things where there is some alternative that requires less or no allocations.

    Stop Converting Strings Just to Compare Them

    Strings are everywhere in our code, and the way we compare them can make a surprising difference in performance. Here’s our first rule that catches a really common mistake.

    We’ve all done this at some point:

    // The slow way
    if (someString.ToLower().Equals(otherString))
    {
        // Do something
    }
    

    Or maybe this version:

    // Also slow
    if (someString.ToUpper().Equals(otherString))
    {
        // Do something
    }
    

    What’s wrong with this? A few things:

    • It creates a whole new string just for the comparison
    • It wastes memory for this temporary string
    • It has to convert every character before it even starts comparing

    The Better Way

    There’s a much faster way to do the same thing in C#:

    if (String.Equals(someString, otherString, StringComparison.OrdinalIgnoreCase))
    {
        // Do something
    }
    

    This skips creating new strings completely and just does the comparison directly.

    The Semgrep Rule

    Here’s the rule I made to catch this in your code:

    rules:
      - id: csharp-inefficient-string-comparison
        patterns:
          - pattern-either:
              - pattern: $STR.ToLower().Equals($OTHER)
              - pattern: $STR.ToLowerInvariant().Equals($OTHER)
              - pattern: $STR.ToUpper().Equals($OTHER)
              - pattern: $STR.ToUpperInvariant().Equals($OTHER)
          - pattern-not: String.Equals($STR, $OTHER, StringComparison.OrdinalIgnoreCase)
        message: >
          Inefficient string comparison. Use String.Equals(s1, s2, StringComparison.OrdinalIgnoreCase) 
          instead of ToLower()/ToUpper().Equals() for better performance and clarity.
        languages: [csharp]
        severity: WARNING
        metadata:
          category: performance
          subcategory:
          - easyfix
          - strings
    

    This catches all four ways people typically do the slow comparison, but it won’t bug you if you’re already doing it the right way.

    How Much Faster Is It Really?

    I ran some benchmarks to see exactly how big the difference is:

    | Method                                        | Mean       | Allocated |
    |---------------------------------------------- |-----------:|----------:|
    | StringEquals_OrdinalIgnoreCase_SameIgnoreCase | 0.0138 ns | - |
    | StringEquals_OrdinalIgnoreCase_Different | 0.0327 ns | - |
    | ToUpperInvariant_Different | 16.4798 ns | 56 B |
    | ToLowerInvariant_Different | 16.6340 ns | 56 B |
    | ToLowerInvariant_SameIgnoreCase | 17.4380 ns | 56 B |
    | ToUpper_Different | 18.7223 ns | 56 B |
    | ToLower_Different | 19.5512 ns | 56 B |
    | ToLower_SameIgnoreCase | 21.0037 ns | 56 B |
    | ToUpperInvariant_SameIgnoreCase | 29.8586 ns | 112 B |
    | ToUpper_SameIgnoreCase | 34.5027 ns | 112 B |

    Why Should You Care?

    “But it’s just nanoseconds,” you might say. True, but:

    • In a busy app, you might do these comparisons millions of times
    • Every little memory allocation makes the garbage collector work harder
    • These tiny slowdowns add up across your whole codebase

    This is just the first of several performance-boosting rules I’m working on. If you add these to your workflow, you’ll catch these speed bumps before they slow down your code.

    Want to see all the rules and benchmarks? Check out csharp-semgrep-performance on GitHub.