Strings are immutable in C#. That means that string concatination done wrong can create loads of redundant strings, meaning more work for the garbage collector to do.

Let’s take a look at the various ways that we can concatinate strings and see what’s actually happening behind the scenes.

Result is known at compile time

This is the simplest case. If the compiler can work out what the resulting string will be, it may decide to concatinate the string for you at compile time.

private const string Country = "England";

var result = "I am from " + Country;

This compiler might well turn this into:

var result = "I am from England";

This is obviously very efficient!

Concatinating strings on one line

I’ve seen answers on stackoverflow, and heard people talk anecdotally, about how concatinating multiple strings using the + opertator is bad practice.

var result = string1 + string2 + string3 + string4;

A common misunderstaning is that this operation always makes multiple string allocations on the way to the result, and is therefore inefficient. This is not actually the case.

This is what happens when you concatinate strings in this way:

  1. The total length of the result is calculated by adding the length of the strings together
  2. A new string of this total length is created
  3. Each string is copied into it's place in the new string, using the very quick Buffer.Memcpy

This is actually a very efficient way of concatinating strings. If you are concatinating strings in a loop, however, this is not the best option.

Concatinating strings in loop

If you were to concatinate strings in a loop using the + operator, you could potentially end up with multiple redundant string allocations:

foreach (var person in users) {
  allPeople += person.Name;
}

This is because each iteration of the loop will perform an independant concatination operation, leaving behind the previous string for the garbage collector to deal with.

This is when it is better to use StringBuilder:

var sb = new StringBuilder();

foreach (var person in users) {
  sb.Append(person.Name);
}

var result = sb.ToString();

So what’s happening here?

  1. A new StringBuilder is created. This contains a buffer to hold the string it will produce
  2. For each .Append(someString) operation, someString is added to the StringBuilder's internal buffer
  3. If the buffer is too small to hold the string, it is expanded (or another StringBuilder is created, see later)
  4. When .ToString() is called on the StringBuilder, a real string is generated from the StringBuilder's internal buffer

Using StringBuilder, there are no additional string allocations. In fact, there are no new memory allocations at all unless the internal buffer isn’t big enough and needs to grow.

This is much more efficient when concatinating multiple strings in a loop.

String.Format

Which is more efficient?

Console.WriteLine("Hello, " + "world");

or

var place = "world";
Console.WriteLine("Hello, {0}", place);

You can probably guess that the second option is likely to be slower, but you might not know why.

The only class that can perform string formatting is StringBuilder. This means that every time a string requires formating (Console.WriteLine, String.Format etc), this has to happen:

  1. A StringBuilder is created, or retrieved from the cache
  2. .AppendFormat() is called on this StringBuilder, passing the string to format and the arguments
  3. The string is parsed to find the sections to replace, progressively writing the result to the StringBuilder's internal buffer
  4. The .ToString() method is called on the StringBuilder, creating the resulting string.

You can see that any string formatting is (relatively) expensive. If all you are doing is a basic string concatination, you should consider another method, however it is important to remember that in most cases, readability is more important than micro-optimisations like this.

If formatting the string makes more sense, don’t be afraid to use it, just know what is happening underneath.

Large strings

In some extreme cases, you might be working with very large strings that might be large enough to end up on the Large Object Heap (for example, large JSON strings). If the string isn’t intended to be around for long, you generally don’t want it to end up on there.

The StringBuilder class is cleverly designed to avoid the Large Object Heap.

The internal buffer size of the StringBuilder is set, by default, so that the StringBuilder will not grow large enough to end up on the Large Object Heap.

StringBuilder objects are actually stored as a linked list, so when the internal buffer is not big enough to hold the string passed from the next .Append() call, the following happens:

  1. A new StringBuilder is created
  2. The content of the current StringBuilder is copied in to the new StringBuilder
  3. A reference to the new StringBuilder is stored in the current StringBuilder's 'm_ChunkPrevious' property
  4. The current StringBuilder's buffer is cleared, ready to accept more strings

When .ToString() is called, and it is time to get the resulting string out of the StringBuilder, it can just follow it’s chain of linked StringBuilder objects until it has recreated the entire string.

This means that one large StringBuilder is reduced to a linked list of smaller StringBuilder objects, cunningly avoiding one big object ending up on the Large Object Heap.

It is worth noting that if you are working with strings of this size, you might consider whether you can use a stream instead.

Conclusion

As mentioned above, readability is more important than small optimisations. That said, when two options are equally clear, it is worth knowing which is more efficient and how they work behind the scenes.