Following up on my previous post, I’ve put together a new set of Semgrep rules focused specifically on string-related performance issues in C#.
These are the kinds of things that rarely show up in code reviews, and not everything is covered by Resharper or IDEs. The issues adds up tho, especially if they are in a hot path. Things related to strings can also cause a lot of allocations, leading to plenty of work for the garbage collector. These rules are designed to be lightweight, easy to integrate into your workflow, and catch the kind of subtle inefficiencies that can quietly degrade performance over time. Some of the have auto fixes, meaning you can apply the rules to your code base, and it will sort it out. This is still work in progress, but let’s go through the rules.
1. String Comparison
Calling ToLower()
or ToUpper()
just to compare strings is wasteful, it allocates a new string, converts every character, and then compares. Use string.Equals(str1, str2, StringComparison.OrdinalIgnoreCase)
to compare the strings without creating any temporary strings. Resharper does not flag this.
public bool ToLower_Different()
{
// Here ToLower allocates a new string.
return TestString1.ToLower().Equals(TestString2);
}
public bool StringEquals_OrdinalIgnoreCase_SameIgnoreCase()
{
// Here we compare without allocating new strings
return string.Equals(TestString1, TestString2,
StringComparison.OrdinalIgnoreCase);
}
csharp-inefficient-string-comparison.yaml
rules:
- id: csharp-inefficient-string-comparison
patterns:
- pattern-either:
- pattern: $STR.ToLower().Equals($OTHER)
- pattern: $STR.ToLowerInvariant().Equals($OTHER)
- pattern: $STR.ToUpper().Equals($OTHER)
- pattern: $STR.ToUpperInvariant().Equals($OTHER)
- pattern-not: String.Equals($STR, $OTHER, StringComparison.OrdinalIgnoreCase)
message: >
Inefficient string comparison. Use String.Equals(s1, s2, StringComparison.OrdinalIgnoreCase)
instead of ToLower()/ToUpper().Equals() for better performance and clarity.
fix: String.Equals($STR, $OTHER, StringComparison.OrdinalIgnoreCase)
languages: [csharp]
severity: WARNING
metadata:
category: performance
subcategory:
- easyfix
- strings
references:
- "https://blog.smistad.me/semgrep-rules-for-c-performance/"
2. Avoid string.Format
for cases where interpolation is enough
string.Format
adds overhead and is harder to read. Interpolation ($"..."
) is faster and alloc-free in simple cases. For more complex formatting you should continue to use string.Format, but where it is used for basic string concatenation you should switch to string interpolation. Resdharper suggests fixing this if you use string.Format
, but not in cases where you use string.Concat
.
public string Format()
{
// Resharper suggests switching to interpolation
return string.Format("{0} {1} {2}", Left, Right, Middle);
}
public string Interpolation()
{
return $"{Left} {Right} {Middle}";
}
public string Concat()
{
// No suggestion to fix this from Resharper
return string.Concat(Left, " ", Right, " ", Middle);
}
Method | Mean | Ration | Allocated |
Interpolation | 0.4472 ns | 1.00 | – |
Concat | 19.2632 ns | 43.16 | 56 B |
Format | 44.2375 ns | 99.12 | 56 B |
Interpolation is much faster than the alternatives. The benchmark here is the code shown above, so I guess the interpolation just gets optimized away in the end. This will also apply to your actual code in situations where you use it for simple string concatenations. It also causes less allocations than the alternatives.
The reason is that we have to avoid parsing the format parameters, in the cases where you just refer to a variable. So this improvement only really works in simple use cases.
csharp-string-format-to-interpolation.yaml
The regex for detecting more complex format parameters is not quite working. So this rule currently picks up some false-positives.
rules:
- id: csharp-string-format-to-interpolation
languages: [csharp]
severity: WARNING
message: "Use string interpolation ($\"...\") instead of string.Format for simple cases"
metadata:
description: "Detects simple string.Format calls that could be replaced with string interpolation"
category: "performance"
references:
- "https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated"
- "https://blog.smistad.me/semgrep-rules-for-c-performance/"
technology:
- csharp
subcategory:
- easyfix
- strings
pattern-either:
- pattern: string.Format("$FMT", $A1)
- pattern: string.Format("$FMT", $A1, $A2)
- pattern: string.Format("$FMT", $A1, $A2, $A3)
- pattern: string.Format("$FMT", $A1, $A2, $A3, $A4)
pattern-not-regex: \{\d+:[^}]+\}
3. Use AsSpan()
Instead of Substring()
In some cases we can avoid allocating a new string with string.Substring()
and instead use .AsSpan()
.
Some typical cases we can avoid is inputs to int/double/Guid.Parse()
methods, or comparing a substring to a string literal.
// allocates new string
int.Parse(tName.Substring("VariantArray".Length),;
if (s.Substring(i) == "INF")
// Using AsSpan()
int.Parse(tName.AsSpan("VariantArray".Length));
if (s.AsSpan(i).SequenceEqual("INF"))
Here Resharp will suggest to use a range index instead of substring, but this is actually slower than using the substring method. You do not avoid any allocations either, as you do if you use .AsSpan()
.
[Benchmark(Baseline = true)]
public string Substring()
{
return "this is my wonderful string".Substring("this".Length);
}
[Benchmark]
public ReadOnlySpan<char> AsSpan()
{
return "this is my wonderful string".AsSpan("this".Length);
}
[Benchmark]
public string RangeIndex()
{
// Resharper will sugest changing your code to this
return "this is my wonderful string"["this".Length..];
}
Method | Mean | Ratio | Allocated |
AsSpan | 0.2085 ns | 0.05 | – |
Substring | 4.5797 ns | 1.00 | 72 B |
RangeIndex | 6.5919 ns | 1.44 | 72 B |
csharp-substring.yaml
rules:
- id: csharp-avoid-substring-for-span-accepting-methods
languages: [csharp]
message: Use AsSpan instead of Substring to avoid string allocations when passing to methods accepting ReadOnlySpan<char>.
severity: WARNING
metadata:
category: performance
subcategory:
- easyfix
- strings
likelihood: LOW
impact: LOW
patterns:
- pattern: $METHOD($STR.Substring($IDX))
- metavariable-regex:
metavariable: $METHOD
regex: >
(int|float|double|decimal|uint|long|bool|Guid|DateTime|DateTimeOffset)\.(Parse(Exact)?|TryParse(Exact)?)
fix: $METHOD($STR.AsSpan($IDX))
- id: csharp-avoid-substring-for-suffix
pattern: $STR.Substring($IDX)
message: Use AsSpan instead of Substring to avoid string allocations.
languages: [csharp]
severity: INFO
metadata:
category: performance
subcategory:
- easyfix
- strings
likelihood: LOW
impact: LOW
fix: $STR.AsSpan($IDX)
- id: csharp-avoid-substring-equals
pattern: $STR.Substring($IDX) == "$SUFFIX"
message: Use AsSpan(...).SequenceEqual("...") instead of Substring == "..." for performance.
languages: [csharp]
severity: WARNING
metadata:
category: performance
subcategory:
- easyfix
- strings
likelihood: LOW
impact: LOW
fix: $STR.AsSpan($IDX).SequenceEqual("$SUFFIX")
4. Optimize UTF-8 Transcoding
Not avoiding any allocations with this one, but you save some CPU cycles. This is also caught by Resharper
return Encoding.UTF8.GetBytes("ThIs A StRiNG");
// Can be shortend to this:
return "ThIs A StRiNG"u8.ToArray();
csharp-avoid-transcoding.yaml
rules:
- id: csharp-avoid-transcoding
patterns:
- pattern-either:
- pattern: Encoding.UTF8.GetBytes("$STR")
message: Use u8 to avoid csharp-avoid-transcoding
fix: \"$STR\"u8.ToArray()
languages: [csharp]
severity: WARNING
metadata:
category: performance
subcategory:
- easyfix
- strings
likelihood: LOW
impact: LOW
Want to see all the rules and benchmarks? Check out csharp-semgrep-performance on GitHub.
Leave a Reply