I'll just come right out and say it: I love strings. As far as I'm concerned, there isn't a problem that I can't solve with a string and perhaps a regular expression or two. But maybe that's just my lack of math skills talking.
In all seriousness, though, the type of programming we do on Stack Overflow is intimately tied to strings. We're constantly building them, merging them, processing them, or dumping them out to a HTTP stream. Sometimes I even give them relaxing massages. Now, if you've worked with strings at all, you know that this is code you desperately want to avoid writing:
static string Shlemiel()
{
string result = "";
for (int i = 0; i < 314159; i++)
{
result += getStringData(i);
}
return result;
}
In most garbage collected languages, strings are immutable: when you add two strings, the contents of both are copied. As you keep adding to result
in this loop, more and more memory is allocated each time. This leads directly to awful quadradic n2 performance, or as Joel likes to call it, Shlemiel the painter performance.
Who is Shlemiel? He's the guy in this joke:
Shlemiel gets a job as a street painter, painting the dotted lines down the middle of the road. On the first day he takes a can of paint out to the road and finishes 300 yards of the road. "That's pretty good!" says his boss, "you're a fast worker!" and pays him a kopeck.
The next day Shlemiel only gets 150 yards done. "Well, that's not nearly as good as yesterday, but you're still a fast worker. 150 yards is respectable," and pays him a kopeck.
The next day Shlemiel paints 30 yards of the road. "Only 30!" shouts his boss. "That's unacceptable! On the first day you did ten times that much work! What's going on?"
"I can't help it," says Shlemiel. "Every day I get farther and farther away from the paint can!"
This is a softball question. You all knew that. Every decent programmer knows that string concatenation, while fine in small doses, is deadly poison in loops.
But what if you're doing nothing but small bits of string concatenation, dozens to hundreds of times -- as in most web apps? Then you might develop a nagging doubt, as I did, that lots of little Shlemiels could possibly be as bad as one giant Shlemiel.
Let's say we wanted to build this HTML fragment:
stuff
stuff
stuff
stuff
Which might appear on a given Stack Overflow page anywhere from one to sixty times. And we're serving up hundreds of thousands of these pages per day.
Not so clear-cut, now, is it?
So, which of these methods of forming the above string do you think is fastest over a hundred thousand iterations?
1: Simple Concatenation
string s =
@"" + st() + st() + @"
" + st() + @"
" + st() + "
" + st() + "
";
return s;
2: String.Format
string s =
@"{0}{1}
{2}
{3}
{4}
";
return String.Format(s, st(), st(), st(), st(), st());
3: string.Concat
string s =
string.Concat(@"", st(), st(),
@"
", st(),
@"
", st(), "
",
st(), "
");
return s;
4: String.Replace
string s =
@"{s1}{s2}
{s3}
{s4}
{s5}
";
s = s.Replace("{s1}", st()).Replace("{s2}", st()).
Replace("{s3}", st()).Replace("{s4}", st()).
Replace("{s5}", st());
return s;
5: StringBuilder
var sb = new StringBuilder(256);
sb.Append(@"");
sb.Append(st());
sb.Append(st());
sb.Append(@"
");
sb.Append(st());
sb.Append(@"
");
sb.Append(st());
sb.Append("
");
sb.Append(st());
sb.Append("
");
return sb.ToString();
Take your itchy little trigger finger off that compile key and think about this for a minute. Which one of these methods will be faster?
Got an answer? Great!
And.. drumroll please.. the correct answer:
It. Just. Doesn't. Matter!
We already know none of these operations will be performed in a loop, so we can rule out brutally poor performance characteristics of naive string concatenation. All that's left is micro-optimization, and the minute you begin worrying about tiny little optimizations, you've already gone down the wrong path.
Oh, you don't believe me? Sadly, I didn't believe it myself, which is why I got drawn into this in the first place. Here are my results -- for 100,000 iterations, on a dual core 3.5 GHz Core 2 Duo.
1: Simple Concatenation | 606 ms |
2: String.Format | 665 ms |
3: string.Concat | 587 ms |
4: String.Replace | 979 ms |
5: StringBuilder | 588 ms |
Even if we went from the worst performing technique to the best one, we would have saved a lousy 391 milliseconds over a hundred thousand iterations. Not the sort of thing that I'd throw a victory party over. I guess I figured out that using .Replace
is best avoided, but even that has some readability benefits that might outweigh the miniscule cost.
Now, you might very well ask which of these techniques has the lowest memory usage, as Rico Mariani did. I didn't get a chance to run these against CLRProfiler
to see if there was a clear winner in that regard. It's a valid point, but I doubt the results would change much. In my experience, techniques that abuse memory also tend to take a lot of clock time. Memory allocations are fast on modern PCs, but they're far from free.
Opinions vary on just how many strings you have to concatenate before you should start worrying about performance. The general consensus is around 10. But you'll also read crazy stuff, like this:
Don't use += concatenating ever. Too many changes are taking place behind the scene, which aren?t obvious from my code in the first place. I advise you to use String.Concat() explicitly with any overload (2 strings, 3 strings, string array). This will clearly show what your code does without any surprises, while allowing yourself to keep a check on the efficiency.
Never? Ever? Never ever ever? Not even once? Not even if it doesn't matter? Any time you see "don't ever do X", alarm bells should be going off. Like they hopefully are right now.
Yes, you should avoid the obvious beginner mistakes of string concatenation, the stuff every programmer learns their first year on the job. But after that, you should be more worried about the maintainability and readability of your code than its performance. And that is perhaps the most tragic thing about letting yourself get sucked into micro-optimization theater -- it distracts you from your real goal: writing better code.
Read More...
[Source: Coding Horror - Posted by Kishore Vengala]