Thoughts on software development: 2011

Sunday 4 December 2011

Auto-refreshing Caches - Part 2

In case you can't guess from the title of this post, this is the second in a series of posts related to my experiences developing and tuning an auto-refreshing cache.

Challenges for pre-loading

The particular caching software which we applied for this project has a typical key-oriented lookup for determining whether an item has already been populated into the cache.

The code for pre-loading of entries into the cache needs to use the same key value as a regular cache lookup will use.
- The default cache key generation system may not be appropriate.
- The key generation settings need to be kept in sync between the pre-loader and the regular fetcher.

Unit testing the service layer

By setting up a unit test for the pre-loading of each cache we have added some automated checking of the cache key generation.

The steps required are:
- Mocking of DAO for service layer to use in tests
- Calling service layer with known parameters
- Generating expected cache key with same known parameters
- Asserting that the expected cache key matches with the key of the cached entry.

Company Christmas Party

Recently the company that I currently work for acquired 80% ownership of an e-commerce company that I previously worked for.

I'll see a few more familiar faces at the Christmas party this week.

With a bit of luck I might even do some match-making - borrowing a developer to give the team a hand for my current project.

Tuesday 22 November 2011

Hey, that's cool! - Contagious enthusiasm

I find it fun to work with young developers, their enthusiasm and willingness to learn new things makes leading the way less of a chore and more of a privilege. Today was a great example. Late in the day I set up a user account in Jira and emailed it across to the newest member of the team. As we got ready to shut down our development for the day he said something along the lines of, "Hey, that's cool. I'll definitely be using this. I've only used bugzilla before, but this looks heaps better. Can it integrate with tasks in eclipse?" ... The next ten minutes flew by as I gave a brief demo of the plugin, including uploading/downloading of contexts. Then it was time to show how the subversion check-in comments automatically tie together which files were updated for each task. I think the next step will be to get Jenkins running on a shared server instead of just on my laptop.

Sunday 20 November 2011

Auto-Refreshing Caches - Part 1

Introduction

This blog entry provides an overview of my recent experience of developing a software system which keeps its core data fresh and ready to be presented to users.

It is a work in progress, so may end up spanning a few posts.

What do I mean by an auto-refreshing cache?

A cache which has its content pre-loaded and refreshed automatically without user action.

Why do I want an auto-refreshing cache?

The desired time for displaying content on screen is less than the time required to fetch the data from the data source across the network - by orders of magnitude.

The remote data source and network connection are beyond our control.

There's more than one way data can change over time

There are two primary ways in which the data in this system changes over time, which should be reflected in the state of the cache:
- Data that should no longer be displayed because it is no longer relevant
- Data properties that change over time

Automating removal

The data to be cached includes some date and time properties which can be used as a basis for removal from the cache.

Due to the diverse nature of the data we have multiple caches. In some caches an entity has its own entry keyed by a unique identifier, while in other caches multiple entities are grouped together by a key generated from the criteria used for the data source lookup (e.g. date and group id).

For the grouped data, the approach for removing expired data involves:
- iterating over the cache entries
- checking that the overall cache entry is not due to expire
- obtaining a write lock on the cache entry
- iterating over the entities contained in the cached data structure and removing those that are expired
- putting the updated data back into the cache
- releasing the write lock on the cache entry

Automating updates of existing entries

For data which is already held in the cache and is not ready to be removed, we can re-fetch the data from the data source and write it into the cache.

Wednesday 19 October 2011

Ensuring your threads shut down

A while ago I joined a project which had some old school (naive) approaches to concurrent processing in an object oriented language.

In these enlightened times I automatically double check whenever I see a Java class that has been declared with "extends Thread" (not just because some former colleagues regarded the extends keyword as blasphemy).

I've read enough to know to prefer implementing an interface rather than extending a concrete class. In this case the appropriate Interface is Runnable, without even delving into the excellent java.util.concurrent APIs.

For this most recent project I have gradually eliminated all "extends Thread", sometimes by simply implementing Runnable, and sometimes by introducing a TimerTask.

Today I have been troubleshooting some problems which our Java application has when interacting with native code (compiled C++ libraries for interacting with hardware). As part of this exercise the application needed to be overhauled to ensure that all threads that are started are correctly shut down when Tomcat is shutdown.

jvisualvm proved to be a useful tool once again for inspecting which threads were not terminating when I ran the application on my laptop.

Like practically everything else in programming, assigning a useful name to each and every Thread and Timer made it so much easier to trace.

Remember the @PreDestroy annotation is your friend.

Thursday 13 October 2011

Facebook reverse psychology on privacy

I have recently taken a few weeks off work to travel back to my home country to catch up with friends and family, and soak up some of the Rugby World Cup hype.

When it came time to look for contact details for various friends who I hadn't seen for a while, I ended up resorting to Facebook messaging.

A while ago Facebook opened some loophole which allowed me to see contact phone numbers for anyone who I was a Facebook friend with. I quickly thought of that as a not-so-good idea and opted out, but recently I have been thinking this may have been a cunning ploy by Facebook to ensure that Facebook messaging becomes the primary means of communication for situations such as mine.

One of these days I will probably leave Facebook, as it is becoming too ubiquitous so must inevitably be avoided.

Thursday 22 September 2011

ORM you say?

I had a fun chat with some of the guys and girls of the London Java Community (LJC) this week.

There were a few of the usual war stories - like the guy who was talked into going along to a job interview and found that the company had a very zealous approach to "not invented here" technologies - to the extent that they had actually produced their own source control system.

Not embracing open source is one thing, but reinventing the wheel is another extreme (unless you happen to be Linus Torvalds).

A few pints of Magners into the evening, discussion turned to a recent blog post by someone who was calling out ORM as an anti-pattern.

During my first year in London I took the time to read Java Persistence with Hibernate. It gave me a greater appreciation for what Hibernate was capable of, and an understanding of how easy it could be to generate poorly performing data access (such as the infamous N+1 queries).

When it came time for me to wear the "Technical Team Lead" hat on an e-commerce project, I learned the hard way what effect a seemingly trivial design decision could have. At the time the Hybris e-commerce platform included a proprietary xml format for producing Java classes representing persistable objects. The way that the objects can relate to each other could result in database table structures that were not SQL friendly - which made the search engine indexing less useful.

It's my strongly held belief that no matter how good your ORM system is, there will always be a time when someone has to pull out alter table and friends to refactor.

NB: NoSQL systems have not been considered in any of the above statements.

Tuesday 30 August 2011

I guess it just wasn't meant to be

Back in August 2010 I decided to leave my job to find a more satisfying role while I still had a reasonable amount of time left on my visa.

A few weeks later I received an unsolicited approach from someone who I had never met, but who now worked at Autoquake and had liked the look of my LinkedIn profile. I had interviewed with Autoquake before and had actually been offered a position, but hadn't liked the fact that they were still a start-up.

This time around there would be fresh faces from a team that had previously worked together on highly successful well-known web applications etc. etc. So I agreed to another interview.

I was a bit rusty in the interview process, and unsurprisingly was not able to come up with a brilliant solution to a problem that had taken them quite some time to solve with 5 minutes at the whiteboard.

I wasn't entirely surprised not to hear back from any of the people who had interviewed me that time around, though it wasn't particularly courteous or professional of them.

Today I came across something which reminded me of the company, so I tried some Googling, only to find that they no longer exist:
http://www.which.co.uk/news/2011/03/massive-online-used-car-retailer-autoquake-folds-248918/

It's just as well I didn't place much value on the stock options they included in their package offer first time around.

Thursday 25 August 2011

Communication - it's not meant to be a one way thing

I don't know about you, but if I send an email asking a specific question, then I expect a response within a day or so - even if it's just to say, "We'll have to get back to you at the end of the week".

If this lack of responsiveness was from a massive busy government department with an anonymous email address on their website I might understand, but for a commercial organisation that is paying for my services it really doesn't make sense.

Perhaps it's a hangover from the 20th century when some businesses provided their employees limited access to a shared computer for such high tech communication?

Monday 22 August 2011

Making sure that more isn't really less

During a recent code review I came across a slightly naive approach to speeding up a group of HTTP requests - by allocating a thread to each request and setting them all to run at once.

Given that I have some responsibility for and control over the servers that these threads will be connecting to, I wasn't looking forward to each client potentially attempting to establish 20+ connections at a time.

To gain some control over the number of possible concurrent connections I have introduced a sort of connectivity manager that can be configured to permit a limited number of connections, queuing others.

I have only applied this approach for one of the several dozen operations that are exposed so far, but can already see this as being a potential area to reduce some duplication which has accumulated as a result of proof of concept code being adopted into the main project without refactoring.

Thursday 18 August 2011

Stagnating open source

I noticed some unusual behaviour in an application yesterday so I took a dive into the source code to see whether it might be something obvious.

In amongst some business logic for cache refreshing I found a call to CollectionUtils.removeAll. I hadn't personally used this CollectionUtils class before, so I decided to see what functionality it offered and how it worked. The Javadoc looked okay, but the implementation was clearly broken - delegating to a method called retainAll which is very different to what was intended.

Since this CollectionUtils class is part of the Apache commons Collections component I went looking for whether this was a known issue, and whether it had been fixed in a later version.

According to the Apache Jira system the problem was recognised and fixed 5 years ago: https://issues.apache.org/jira/browse/COLLECTIONS-219

Unfortunately it would appear that there has not been a new release in those 5 years.

I suppose that there would not be many applications using the broken removeAll method, as it was only introduced in the last released version.

Perhaps Commons Collections should be moved into the Dormant projects section of Apache?

A productive week so far

One of my colleagues has been on leave this week so I have had a little more freedom to try out changes to the current project without worrying about potential conflicts when checking the changes into the version control system.

One of the changes that I had been contemplating for a while involved introducing Spring Flex so that our main integration point between BlazeDS and our Java web application can make use of the same dependency injection as the rest of the application.

As usual, the documentation was quite thorough, but due to my lack of familiarity with Flex I found some blog posts more useful.

Now that Spring had responsibility for dealing with the MessageBroker and creating the Java object that it communicates with, I was able to do away with a few dozen lines of context.getBean("someBeanName") - inversion of control is our friend :)

Running the application seemed fine, so I moved on to some memory usage profiling.

jvisualvm is a very useful tool for observing what is really going on in the memory and CPU usage of a Java application. Over the course of an afternoon I detected that there were three instances of something that should have been a singleton, and at least one case of a new Thread being created each time a particular screen was visited - without terminating the previous Thread.

Wednesday 17 August 2011

Killing your competitor's killer app

Up until last week I was blissfully ignorant of the popularity of Blackberry devices to young people. I was stuck in the mindset that business users and Obama were the main fans of the technology because of its email capabilities.

The mainstream media have been heavily publicizing the use of Blackberry's other messaging capabilities and how it was applied by naughty people to coordinate disorder.

I wonder if sales of Blackberry devices will be negatively effected by the discussions about giving police powers to shutdown services in times of unrest etc.

I see Apple have managed to arrange for Samsung's new tablet to be temporarily embargoed from sale in most of the EU.

It would make a terrible movie plot to link these events.

Thursday 2 June 2011

Is London actually quite small?

I am now eight months into my current job and I have been involved in a handful of projects which have somehow involved no less than four of the other organisations that I interviewed with in the last couple of years.

Interestingly each of the organisations had either made me some sort of offer, or is still trying to recruit me.

I didn't do a shotgun approach to applying for jobs, so is the industry really quite small?

Update: Now my former employer is trying to form a strategic partnership with my current employer. I'm not quite arrogant enough to say it's because they miss me so much.

Tuesday 17 May 2011

Dependency management - the gotchas

Love it or hate it Maven and its ilk have reached a level of use in mainstream Java projects that it is generally simpler to set up your project dependencies using pom.xml file(s) rather than manually obtaining the appropriate jars for the libraries that you need to make use of.

To bring in a simple standalone library that is self-contained (i.e. a single jar with no dependencies) this can be a simple matter of including a few lines of XML.

If your project is doing anything slightly non-trivial then there is a reasonable chance that it will, in turn, be making use of non-trivial dependencies - which depend on other third party libraries/jars. These are termed as transitive dependencies.

If you are not paying attention to the way transitive dependencies are defined in third party POM files, you can experience what a few people I generally respect for their software development experience refer to the phenomenon of "maven downloading the Internet".

Sonatype has circulated a blog post on that subject:

http://www.sonatype.com/people/2011/04/how-not-to-download-the-internet/

I would like to highlight a couple of gotchas, surprising things that can slip into your project if you don't pay enough attention.

Take for example the popular logging system log4j. If you simply specify in your POM file that you want log4j included in your project would you expect javax.mail to be brought along with it?

javax.mail then has a dependency on the activation framework - which could be another jar.

This is a relatively basic example of a system specifying dependencies for functionality that you might not actually make use of. In this case log4j can allow you to specify and SMTPAppender to send out email messages when significant events are logged.

I first tried to use the SMTPAppender several years ago on a project that wasn't making use of a depency management system, so I ended up scratching my head wondering why the application was silently failing every time that something should have been logged using the SMTPAppender. Any guesses what the problem was? - I hadn't deployed javax.mail or the activation framework jar.

The log4j example above might be considered as an example of how dependency management can stop you from making silly mistakes, so the example that follows will show how the opposite can also be true.

On a fairly recent project I have been making use of Jersey to expose RESTful web services with the JAX-RS standard.

Partway through the project I was following some posts on the user group and realised that the latest version would allow us to have our JSON objects represented in a much more efficient manner, so I simply updated the version number for the relevant dependencies in my project pom.xml file and set maven off to download and assemble a fresh artifact.

Everything went fine until I attempted to deploy the generated WAR file. Then, much to my surprise, I began to see a warning about the inclusion of a jar file containing the servlet API - something which Tomcat is thankfully clever enough to exclude from the classpath.

Using the m2eclipse Maven plugin for eclipse I was soon able to determine that the Jersey 1.6 server artifact included a dependency upon the servlet 3.0 API.

I received a prompt response to a post on the Jersey user group advising that the error would be corrected in the next nightly build, but I am reluctant to deploy with a nightly build.

I don't know about you, but I really don't like seeing warnings in my application logs. So after a few days of tolerating the warning and checking on the status of version 1.7 of Jersey I decided to eliminate the problem by overriding the dependency using an alternative scope in a dependencyManagement section of my project's POM file. Not ideal, but effective.

So, the main thing to take away from these cases is that you should take the time to examine what your project dependency tree looks like whenever you bring in a new dependency or change version of an existing dependency.

Friday 15 April 2011

HTML5 Web Sockets - not just a browser thing

There is a lot of hype going around at the moment related to the wonderful functionality that websites will be able to offer once browsers implement the so-called "HTML5" features.

Some features should enable graphical capabilities without requiring a plugin (bye bye Flash), others are targetted at mobile devices - such as enabling the capture of an image from a camera.

The one that has been most relevant to my current project is Web Sockets.

In my opinion the key to Web Sockets living up to their full potential is that http servers must be able to support the model of keeping connections open.

At the time of writing, Apache httpd does not support this as it is architected for synchronous processing of requests and sending of responses.

Wednesday 13 April 2011

Technical challenges streaming data over HTTP

I have recently been exploring options for enhancing the functionality of an existing website by integrating some streaming data feeds.

In the interests of making this functionality available on a wide range of devices (read iPads and iPhones as well as PCs) I would like this to be achieved without Flash or any other proprietary plug-in technology.

So, my old friend and foe Javascript is the lead candidate for providing the functionality on the client side.

Almost every article and blog post that I have read on the subject emphasises that client side polling is not appropriate, and that instead a continuous connection must be kept open between the web browser and the http server.

To avoid some of the challenges posed by Javascript's cross site scripting (XSS) security restrictions we would ideally push the data feeds out from the same domain name as the page that contains the feeds. For less data intense AJAX functionality this was made possible by hosting the servlet engine behind apache httpd.

Historically the way that most http servers have served their content involved dedicating a thread or process to serve each incoming request from start to finish. This does not scale well when incoming requests need to hold a connection over a prolonged period of time, so alternative approaches such as Comet have been implemented.

The Tomcat implementation of Comet is not supported when Tomcat is hosted behind apache httpd over AJP.

I was considering keeping the data serving system physically separate from the serving of HTML content anyway, so this restriction is just giving a friendly push to try using a separate sub-domain.

Wednesday 16 February 2011

Twitter protocol/etiquette

Okay, so I'm on Twitter and I generally check for updates at least once a day.

I used to have a policy of blocking followers who followed at least 3 times as many tweeters as followed them because they smelt like spammers / marketers (particularly those that magically tweet every x hours, no matter the time of day or night).

Sure enough, now I find myself edging towards my own threshold of ignorability. (Don't worry, I have a higher tolerance for those who make up words, though the same cannot be said for those who misspell real words).

Perhaps I will have to post a few more fascinatingly followable tweets to balance the ratio?

Wednesday 5 January 2011

Integrating Axis2 services into an existing web app

I started off with a handful of inherited projects loosely cobbled together with no declarative dependency management (old style - directory of jars in each project, copied into one big mess for deployment).

My initial task was to investigate using the existing system to expose access to the same data sources using REST, without breaking the existing SOAP implementation.

I muddled around for a couple of days and was reasonably happy with what Jersey was able to produce based on JAX-RS, but I wasn't satisfied with the amount of manual steps in the build process.

As a stop-gap measure I cobbled together some ant magic for making the downwind projects aware of their upwind dependencies, so any changes would be detected during a rebuild and a completely up to date WAR would be produced from a single command.

A few weeks later the client's main supplier disappeared over the Christmas break, as I was stuck with a number of unanswered questions I took the opportunity to make the external dependencies of each project explicit, and to re-enable the SOAP functionality as part of the webapp.

Axis2 doesn't lend itself well to being deployed as part of another application, it is more designed for flexibility of deploying services within it, rather than it being deployed into something else.

After a bit of experimentation I had axis2 SOAP services sitting alongside my REST services.

When I came to untangle the combined dependencies of axis2 and the project I found a potentially nasty clash of versions of the commons-httpclient jar. A bit of Googling and reading of Jira posts gave me a fair indication that it wasn't likely to change in axis2 any time soon.

So, I tried something a little drastic - replacing axis2 with CXF.

CXF is more suited for our purposes and by happy coincidence is already integrating with Spring 3.

And we all lived happily ever after.