Tuesday 21 July 2015

The micro benchmarking trap

JVM Compilation Magic

(Updated - corrected code sample to include actual call to indexOf, which I only noticed after tweeting about this.)

A better indexOf for a sorted ArrayList

I'm in the process of brushing up on my knowledge of the Java Collections API at the moment.  When I came across indexOf(Object object) in ArrayList I decided to see how it works.

Although Lists in Java preserve order they cannot assume their contents will be in a particular order.  So in indexOf(myObject) on a List the safest way to guarantee that the index of the first match that exists will be returned is by starting from the head of the list and iterating through the tail.

As an exercise for myself, I decided to compare the performance of indexOf against a slightly tweaked binary search for when the elements of the list are known to be in a sorted order.

The Collections class provides a binarySearch static method which returns the index of an Object in a List - so the majority of the work has already been done for us.

The tweak that I mentioned earlier is that my List can contain duplicates.  So, once the binarySearch has found a match, we still need to check earlier in the List until we can confirm that we have the very first match.

private static void indexOfSortedList() {
    // indexOf is a bit naive, so let's see if a sorted list     // can be made to behave better with binarySearch    ArrayList sorted = new ArrayList<>(50000000);    // Pre-loading randoms    Random random = new Random();    random.setSeed(1337L);    boolean[] randoms = new boolean[20000000];
    for (int b = 0; b < 20000000; b++) {
        randoms[b] = random.nextBoolean();    }

    for (int i = 0; i < 20000000; i++) {
        sorted.add(i);
        // Introducing duplicates at random points        if (randoms[i]) {
            sorted.add(i);        }
    }

    final Integer target = Integer.valueOf(1700000);
    long timeBefore = System.currentTimeMillis();    int anIndex = Collections.binarySearch(sorted, target,            (o1, o2) -> o1.compareTo(o2)
    );
    // We treat anIndex as starting position and move backwards    // until we establish the first instance    while (anIndex > 0 && sorted.get(anIndex - 1).equals(target)) {
        anIndex--;    }
    System.out.println("Duration binary approach " + 
            (System.currentTimeMillis() - timeBefore));    System.out.println("binarySearch value: " + anIndex);
    long timeBeforeIndexOf = System.currentTimeMillis();    int indexOfValue = sorted.indexOf(target);    System.out.println("Duration indexOf " + 
            (System.currentTimeMillis() - timeBeforeIndexOf));    System.out.println("Index of value: " + indexOfValue);}

The computer scientists amongst you should appreciate how a binary search will involve far fewer operations than the linear search that indexOf is based on.

Would it surprise you to find that the indexOf implementation consistently showed a much faster performance?

That was until I scaled up the search space to have 10s of millions of elements to search through, and shifted the match to be quite high in the search space - biasing towards the binarySearch in a very unfair way.

Slightly surprised at this counterintuitive outcome, I went away for lunch and picked up a book: Java Performance : The definitive guide

A section about the JIT compiler gave me a theory to try out - perhaps this code wasn't being compiled beyond byte code because it was only being run once.

Modifying the setup to loop around calling the method showed numbers that would reinforce that theory.  After the first iteration the tweaked binarySearch performed faster than the indexOf call.

Pre-allocating for capacity

Within the same code you may have noticed that I chose to pre-initialise the capacity of the ArrayList.  This is a habit that I got into when want to reduce waste in memory allocation.

Some code that I have deleted from the example above was measuring the performance of the calls to populate the ArrayList.

I observed that specifying the capacity in advance resulted in slower insertion - even though the code for setting up the capacity was not included in the calls to be measured.

When I changed the code to have multiple calls to the method I observed that the first pass through had poor performance around 11s, but the next few calls dropped down around 4s, and subsequent passes went as low as a few hundred millis but fluctuated back up to around 4s.

Summary

My future approaches to trying some minor performance tweaking change out will not involve single runs, but will take into account how the JVM will actually massage the code into its final working state.


No comments:

Post a Comment