Wednesday 4 December 2013

Premature optimisation, or common sense - Java StringBuilder construction

When I look at a Java codebase I often notice inefficient use of StringBuilders (or StringBuffers).

Typically this will involve default constructor being used, followed by appending some String content that will obviously exceed the default StringBuilder size of 16 characters.

For the purposes of illustration it might look something similar to the following:

private String formatThings(Collection somethings) {
  StringBuilder sb = new StringBuilder();
  sb.append("

");
  for (something : somethings) {
    sb.append("").append(something).append("
\n");
  }
  sb.append("
\n");
  return sb.toString();
}

In the real world the accumulating Strings might originate from parsing of some XML or traversing of some other structure which the programmer might reasonably expect to know the String length of.

If we look at the total number of characters involved even for a single iteration in our toy example then we can see that the number of characters appended to the StringBuilder exceeds 16.

The way that the StringBuilder increases in capacity to allow for more characters than the existing capacity involves creation of a new array more than doubling the previous capacity and copying the existing characters into the new array.  This leaves an unreferenced array waiting to be garbage collected.

If the process of expanding capacity occurs every time that the method is called then there will be at least one useless data structure that is created and then discarded and needs to be garbage collected.

In a more realistic situation, the default constructor stays the same, 16 characters, then the accumulated String length getting up to 400 or 500 characters - common enough for a few lines of formatted text - then this could explode out:
16 - not big enough
34 - still not big enough
70 - not big enough
142 - still got to grow
286 - not there yet
574 - okay, let's say we're finished.

So, to get up to a structure sufficiently large enough we need to waste 16 + 34 + 70 + 142 + 286 = 548 bytes across 5 arrays - which is larger than what we actually need to hold.

If we know that our method is going to typically involve hundreds or thousands of characters, I would argue that the StringBuilder default constructor should be abandoned in favour of simply specifying a more appropriate size.

Update:
There is an alternative to specifying the required size at construction time - calling ensureCapacity with the required total size will only increase the backing array once if necessary.

I can't recall having seen this method called in codebases that I have worked on, so I was a little surprised to see that it is not something that has been added recently.