Wednesday 4 December 2013

Premature optimisation, or common sense - Java StringBuilder construction

When I look at a Java codebase I often notice inefficient use of StringBuilders (or StringBuffers).

Typically this will involve default constructor being used, followed by appending some String content that will obviously exceed the default StringBuilder size of 16 characters.

For the purposes of illustration it might look something similar to the following:

private String formatThings(Collection somethings) {
  StringBuilder sb = new StringBuilder();
  sb.append("

");
  for (something : somethings) {
    sb.append("").append(something).append("
\n");
  }
  sb.append("
\n");
  return sb.toString();
}

In the real world the accumulating Strings might originate from parsing of some XML or traversing of some other structure which the programmer might reasonably expect to know the String length of.

If we look at the total number of characters involved even for a single iteration in our toy example then we can see that the number of characters appended to the StringBuilder exceeds 16.

The way that the StringBuilder increases in capacity to allow for more characters than the existing capacity involves creation of a new array more than doubling the previous capacity and copying the existing characters into the new array.  This leaves an unreferenced array waiting to be garbage collected.

If the process of expanding capacity occurs every time that the method is called then there will be at least one useless data structure that is created and then discarded and needs to be garbage collected.

In a more realistic situation, the default constructor stays the same, 16 characters, then the accumulated String length getting up to 400 or 500 characters - common enough for a few lines of formatted text - then this could explode out:
16 - not big enough
34 - still not big enough
70 - not big enough
142 - still got to grow
286 - not there yet
574 - okay, let's say we're finished.

So, to get up to a structure sufficiently large enough we need to waste 16 + 34 + 70 + 142 + 286 = 548 bytes across 5 arrays - which is larger than what we actually need to hold.

If we know that our method is going to typically involve hundreds or thousands of characters, I would argue that the StringBuilder default constructor should be abandoned in favour of simply specifying a more appropriate size.

Update:
There is an alternative to specifying the required size at construction time - calling ensureCapacity with the required total size will only increase the backing array once if necessary.

I can't recall having seen this method called in codebases that I have worked on, so I was a little surprised to see that it is not something that has been added recently.

Tuesday 22 October 2013

What's the point of highlighting story point "velocity"?

I see story points as an early estimation of relative complexity of a given software development task.

Part of the purpose of using story points instead of "ideal man days" is that the team can avoid getting too bogged down in how long it will actually take to achieve.

Keeping a record of these early estimates is fair enough, but is there really any point in using them as a significant indicator of the team's achievement from iteration to iteration?

What can the team do when they see the story point count is a little low so far in the current iteration?

  • Arbitrarily re-prioritise stories?
  • Cut corners by compromising on quality?
  • Work some extra time without pairing?
  • Save time in meetings by not having a say?
Estimates have some usefulness, but it is my assertion that they are counter productive as an indicator of progress mid-iteration.


Monday 26 August 2013

What prevents Tomcat from shutting down?

Every now and then I come across a web application that won't shut down cleanly.

./catalina.sh start

works fine for starting up a Java process with the web application running in the Tomcat servlet container, but 

./catalina.sh stop

leaves the Java process running.

  • Checking the catalina.out log file reveals that the Tomcat system has attempted to shut down.
  • Http requests to the port(s) that Tomcat has been listening on indicate that the system is no longer accepting requests.
  • Repeat attempts at ./catalina.sh stop indicate that Tomcat is no longer listening on the shutdown port.
In my experience, without fail, the underlying cause of this problem has been a non-daemon thread which has been started by a web application that has not been configured to should be shut down when Tomcat is shut down.

If you are not totally familiar with how the application hangs together, you may need some hints about where to look for the offending code.  Triggering a thread dump of the Tomcat process after the shutdown has failed should reveal a list including some non-daemon threads.  The names of the threads should give an indication of whether it is a JVM thread, a Tomcat thread, or a thread that has been started by the application.

Once you have established where the threads are being created, it is time to consider where and how to stop the threads.  Depending on the nature of the functionality being provided by the code, you may need to wait for the current processing to complete before terminating - otherwise the application would have used a daemon thread in the first place.

Dependency injection frameworks, such as Spring, will typically include shutdown hooks for any resources that they have responsibility for creating.

Friday 31 May 2013

MongoDB Java driver randomness - it's only logging, but

I saw a number or people on Twitter yesterday posting a link to some "interesting" code in MongoDB's Java driver github.

The strange code located inside some exception handling included the following:

                if (!((_ok) ? true : (Math.random() > 0.1))) {
                    return res;
                }

I know my own code isn't always a shining example to be held up as the one true way of doing things, but seriously doubt that I have ever produced something as inexplicable as that.

I briefly tried to trace back the history to determine how long this has been around and whether there is a meaningful commit comment, but was in the middle of other more important things (lunch) and gave up.

The latest theory I have heard is that the code was introduced as an approach to reduce the amount of noise in logging of the exception.  I don't expect to see this recommended as a pattern any time soon.

Monday 27 May 2013

MongoDB Lightning Talk

My employer recently hosted a gathering of their IT teams from around the world for a two day conference in Madrid.  It was a good opportunity to meet colleagues that I had previously only dealt with by phone and email in a relaxed social setting.

In a moment of madness I volunteered to give a presentation about a topic that I have been dabbling with in recent months - MongoDB.

My agile style of preparation - "delay commitment until the last responsible moment" - could easily have been mistaken for procrastination.  I'm not sure if I would recommend sitting on a sofa watching old episodes of Miami Vice with the laptop drifting in and out of sleep mode as being optimal for all.

Without further ado, here is a high level view of some of the content that I blended together:
  • What is MongoDB?
    • Non-Relational JSON Document Store
    • Dynamic Schema
    • Embedding of Documents and Arrays
    • Comparison with relational database (Table -> Collection, Row -> Document etc.)
  • How to Organise Data
    • Entity -> Document
    • Embedding vs Referencing
    • 16MB document size limit
    • Indexing
  • How to Query
    • Example of find
    • Example of insert
    • Example of aggregation framework
  • What is MongoDB being used for?
    • E-commerce
    • Analytics and Reporting
    • Content Management
      • CoreMedia Elastic Social
    • Logging
  • Language and Framework Support
    • Listing of languages and frameworks
    • Drivers have semantics to fit with the style of the language
  • Replication
    • Configurable tags - usable within write concerns
    • Configurable delay
  • Sharding
    • Distribution of writes
    • Shard key selection importance
    • Scatter-gather for reads not satisfiable by a single shard
  • Gotchas
    • Database level locking
    • Default settings not suitable for all
    • Application needs to check for result of operations

Wednesday 6 March 2013

Online training - lectures

I've recently started an online course to learn a bit more about MongoDB.

The lectures are made available by youtube clips, which is good when I have time at home and there is nothing interesting on t.v.  I think it could be better if I could download the videos and watch them on my iPad during my commute to and from work.

Something like a podcast would deliver the same content, but not have the ability to reinforce the learning with the mini quiz that follows some videos on the current online system.

Sunday 3 February 2013

Comparing Java web application runtime environments

In a recent project the developers in my team have encountered some unexpected differences in behaviour at runtime.  What works fine in one environment could result in a server error in other - supposedly equivalent - environments.

The code and configuration of the web application should in theory be the same, so I'd like to have a way of checking what is actually different under the hood at runtime.

My initial thoughts are that the following will be relevant:
 - environment variables
 - JVM properties
 - loaded classes (or packages)

Exposing the data is one aspect, but in order for this to be useful beyond a one off check it would be good to have a way of quickly highlighting any differences between the output from two setups.

If the data is all key-value pairs then I expect the detecting of differences to be trivial.

UPDATE:
The problem JSPs turned out to be including some expression language syntax.

Sunday 20 January 2013

Apple, welcome to the mainstream

If you'd asked me ten years or so ago what I thought about Apple computers, my response might have been along the lines of "Hmm, they're the computers that people use for desktop publishing and graphic design - people more interested in pretty colours than practical things."

Around the same time, the company that I was working for had developed a new e-commerce website and faced the challenging proposition of ensuring the HTML and CSS styling would not look bad on a range of browsers.  As I wasn't directly involved in that particular project the main issue that I can recall hearing about was that some content didn't look right on MacBook laptops.  Sure enough the CEO of the company was using that particular setup - I'm guessing that less than two percent of the site's potential customers were in the same situation.

Fast forward to the present day and I find myself typing this blog entry on an Apple laptop, listening to music from an Apple phone, procrastinating further after watching a movie via an Apple TV, wondering whether I should ask my housemate to get her own keyboard so that I can make use of my Mac Mini, which might prompt me to clear some space on the desk for the speaker system for the iPad.

Then there's the day job where I am a little envious of the other people in my office receiving shiny new Apple kit, while my team stays on our Linux desktop setup.

Time to focus

I did a tally up a week or so ago and realised that I am part way through reading at least eight books.

It hasn't stopped me from buying more books, but it might be worth allowing more time to finish them.  Luckily BBC IPlayer is misbehaving on my PC, so I should be spending less time watching shows on my computer.

Every time the application is restarted the list of videos downloaded or coming soon shows up as being empty.  Deleting the content from the repository on my hard drive and effectively starting over seems to be the only way to get it to function again.

Sunday 6 January 2013

e-book digital download more expensive than paperback

On two recent occasions when I have been looking to purchase a new book online, I have noticed that the Kindle edition has been more expensive than the equivalent paperback edition.

I felt sure it must be far cheaper to produce the digital download than the physical paper book.

It turns out that value added tax applies on ebooks that does not apply on the physical version.