Thoughts on software development: 2019

Monday, 16 December 2019

Revisiting exploring relatedness of videos using youtube APIs

Introduction

Way back in 2012 I dabbled with youtube's APIs to see how to determine whether two videos could be related via following their "related videos" out until finding a common video or running out of non-duplicate videos in the accumulated graph.

I've recently realised that the code from that particular experiment was only held on my old laptop that died a few months ago. This is a good excuse to re-write it from scratch and produce something a little more useful - such as listing out the videos along the related path.

Blocker

Youtube's APIs have changed, so the approach that I took seven years ago won't work any more due to quota restrictions on API calls.
The only call that can bring back the related videos data is a search, and the minimum quota cost for a single search call is 100. With a daily limit of 10,000 quota that would restrict us to 100 searches - which is simply not enough to build up a graph between videos unless they are very closely related.

Monday, 18 November 2019

Java 11 support added for AWS Lambdas

I've just deployed a little "Hello world" lambda that also logs out the system properties as verification that the setup is correct. Seems to work.

Wednesday, 13 November 2019

Connection pooling in AWS Lambdas

A few months back I posted my findings from troubleshooting a resource leak in a Java application that had been lifted and shifted to become a lambda rather than a long-running app on AWS.

The resource leak turned out to be the setup of a fresh pool of connections to the back-end cache on each invocation of the lambda, without any corresponding clean up call to close those connections.

Yesterday I attended the Redis Day conference in London during which a presentation happened to include some code for a much simpler use case involving connecting to a similar back end cache - Redis - but with a different approach to initialisation.

The lambda from the presentation was extremely lightweight, so there was no bloated microservice framework involved. Just a single method to perform a single task. The initialisation of the connection to the Redis system was performed in a static block and had no corresponding call to close it. I believe that this approach would ensure that the Redis client is reused between invocations of the lambda, and appears to be a recommended pattern for setting up connection pools for other resources such as database connections - relying on the receiving end to clean up resources if / when the lambda's container is shut down terminating the connection.

The static initialisation approach reduces the startup time for all but the first invocation of the lambda.

Monday, 11 November 2019

What's an ideal working environment?

Introduction
My most recent job interview included being asked about what I would consider an ideal working environment to be like. This post will be me thinking aloud about how to respond if the same type of question comes up again.

Team Structure - Serenity Now
Based on my twenty or so years of working as a professional software developer, I have found the most productive teams are those that are self-sufficient for the things that they need to develop, and have the big picture awareness to anticipate in advance what they need from others.
Partway through writing this section I thought about how this matches up to the serenity prayer:

"God, grant me the serenity to accept the things I cannot change,
Courage to change the things I can,
And wisdom to know the difference."

Being able to deliver working software at a stready sustrainable pace is where I get a lot of satisfaction from my work. Having to wait for another team to produce something can be frustrating, so having strong communication skills is a key aspect of ensuring that progress is not blocked.

Team Diversity
This section title may sound like a team name from The Apprentice, or The X Factor, but for me it is all about having a broad range of worldviews and levels of experience.

In two of my jobs I have worked alongside students completing a year in industry as part of their university course's programme. Those projects were some of the most rewarding of my career in terms of developing my mentoring abilities and the satisfaction that can be gained from teaching a concept or the benefits of a methodology such as test driven development (TDD).

My half-joking contribution to the diversity of my latest team was that I was the only member originally from the southern hemisphere. The only time that I considered making a suggestion based on that backround was when I heard that a team was contemplating introducing a snowy winter theme to the homepage of the product's website. It would have been Summer in the southern hemisphere.

Thanks to the melting pot of cultures that London provides, I have also had the opportunity to learn from people of different cultures and religions about how they go about every things in life such as choosing what food stand they can visit at the local street markets, and managing their working day during periods of fasting.

Team size - no one size fits all
Depending on how you shape it, I've either been working in small teams, or large teams.

For the day to day incremental changes that my immediate neighbours and I worked on I have had a small team of three. For the effects that our changes have introduced to the wider development group the number goes up to something closer to fifty or sixty - not counting external consumers of our APIs who should have noticed no impact from our seamless changes.

The main project that I worked on during my time at Elsevier involved something closer to a hundred people. I had regular contact with around fifteen members of the core team, and interacted with a dozen or so others when we needed to address a publication version consistency problem.

Getting back to what I would consider to be an ideal working environment, I think a group of between five and ten people - depending on the range of skills required - is a good size:
- There is enough cover for keeping the work progressing if someone is ill or takes a holiday
- We can have a range of levels of experience
- There are enough people to be reasonably certain of having at least one strong opinion when a decision needs to be made

Inter-team socialising
I find that it's much easier to have a discussion with someone if you have already had an informal conversation with them about non-work matters. I'm not someone that runs workshops, so I'm not going to be that person who sets up icebreaker activities for people to get to know eachother at the start of a multi-day meeting.

On the other hand, I do quite enjoy trivia, so sometime back in 2015 I introduced the Friday Pub Quiz as a mini social event for self-organised teams to participate in at the office's break-out area after the Friday afternoon knowledge share session.

It started off small with just a couple of teams, then expanded out to have a couple more teams, then I introduced the rule that the winning team would have to produce a quiz for the following week.

Four years, and three office moves later, Friday Pub Quiz is still a fairly regular event.
Individuals from different development teams, product owners, senior managers, designers sometimes meet eachother for the first time to form a team, then find themselves presenting to their peers together the next week.

Office environment - open plan, but keep the noise down
I have always worked in open plan offices, so I don't see myself moving to my own walled off space anytime soon. I like the buzz of hearing pairs programming and discussing the next test or feature to be developed.

Having said that, there have been a few ocassions when I might have preferred a different setup - mostly around the issue of noise.

Providing table tennis or foosball in the office environment is all well and good, but it really should be in its own space - not somewhere that has no doors between it and the working environment.

A pet hate of mine at a recent job was the daily crinkle of packets of crisps being eaten at some desks near mine. I consider it a bit like water torture when you don't know when that next noise is coming. I'm now wondering what the mouse and keyboard of those laptops would be like as that salty type of food is something I always need to wash my hands after eating.

Friday, 8 November 2019

Shaking off some job interview rustiness

Introduction - looking for a new challenge is a full time occupation
Having a long term secure job is great, but in general the longer that you are away from the process of looking for a new role, the less well prepared you will be for interviewing for the next one. Please don't take that as an indication that you should hand in your notice for the sake of it, but read on to learn from some of my experiences.

I find the whole process of looking for a new job to be quite a strange situation. Most people that I have worked with somehow find ways of arranging interviews and accepting an offer while they are still actively employed.

I don't like the awkwardness of sneaking around and being economical with the truth - not lying, but not giving a full explanation. So whenever I have come to the decision that it is time for me to change jobs I have resigned, served my notice period including all the appropriate knowledge transfer, and then gone to look for an appropriate company to work for next.

Fortunately in London there are many companies that have a technology team that needs someone with my skills and experience, so I have been able to afford to not ace every interview first time. What follows is a bit of self-analysis of my most recent experiences of being back on the job hunt in 2019.

First interview - brain fade
I did my early interview preparaton no favours by taking a relaxing holiday for six weeks or so without touching a single line of code. Don't get me wrong, it was great to unwind and take some deep breaths and be away from thinking about the ins and outs of the product that I had been working on for the last year and a half. I just didn't appreciate that even twenty years of commercial experience isn't enough to be ready to churn out code at the drop of a hat.

You come to realise just how much you rely on the code completion features of an IDE when you try to jot down a technical solution on a piece of paper. String.charAt has been around forever, but it just escaped my memory when I needed it most. So I found myself going back to first principles and introducing an unnecessary array copy of the underlying characters. 🤦

Second interview - time management
For the second company that I interviewed at I made sure that I had dabbled with some code to solve some common types of low level exercises, then proceeded to make an entirely different type of mistake when it came to the interview. I didn't think enough about the time limit when I came to introduce myself, so ended up giving a five minute speech about my background when a thirty second high level description would have been perfectly sufficient.

Sure enough my technical solution became a little rushed, so instead of reaching for the modern obvious implementation approach of applying streams and lambdas I went back to first principles and applied a simple for loop - making it seem as though I wasn't familiar enough with some of the most widely used Java 8 functionality - despite having used that for over five years.

When it came to a whiteboarding session I was fine for describing how the solution that I had fully designed and implemented worked, but didn't pick up on the different nature of the company that I was interviewing at and the types of challenges that they have had to overcome. My main takeaway from this shortcoming was to look outside of the industry that I have been working in. Check out the various videos available on Youtube where vendors and other businesses describe their approach and the solutions that they produce.

During the third hour of the interview process I faced another group of people, this time probing my personality, attitude, and experiences of different cultures and situations. In the process of producing this blog post I have thought of an answer to one of their more peculiar questions - something that I have learnt from people I work alongside that came as an "aha" moment that I might not have considered before: Colleagues in the US generally get their medical cover through their employer, so it would be a major risk for them to leave their job without having another one lined up to go to.

For some of the more general, "Can you think of a time when..." situational questions I didn't have much of a response. Sometimes I feel fortunate that I haven't felt like I have worked in particularly stressful projects, but unfortunately this line of interview questioning doesn't seem to believe that's possible.

Third interview - getting better technically, just not enough emphasis on end users
I really appreciated having a thorough interview feedback call from the company's in house recruiter recently. The fact that most of it was positive and constructive is making me feel more confident that I am getting better at preparing for interviews and being able to present my thoughts to strangers. Even though I found myself apologising for being flustered and repeating the same introductory wording (so self conscious about repetition...) the tech interviewers expressed that they were pleased that I had managed to come around to a suitable workable solution to the hypothetical scenario that they had raised.

This particular interview had a slightly unorthodox format of including a product focus stage where other companies have a more general behavioural / soft skills stage. Here I didn't strike the right balance between refering to more recent projects versus projects that involved developing functionality for end users. Having spent the better part of a year largely self-managing migrating an internal framework and its associated libraries doesn't offer much to talk about when it comes to requirements and compromises and team dynamics.

Onwards and upwards
I was tempted to consider an offer to go to work at the third company that I interviewed at, but the lower pay rate range that they were suggesting would have been too much of a step down from what I know I am worth. I also had a couple of reservations about the approach that they have taken with their microservices, so I will leave those challenges for others to face.

In the next week or so I expect to be attending interviews at a few more companies that I have applied to. My confidence in my technical and social abilities remains high, and my preparedness for the interview processes has grown considerably.

Thursday, 7 November 2019

London potentially losing the best IT meetup organisation

I was shocked to read a tweet in my timeline a few days ago from Wendy Devolder, the main person behind Skills Matter:

"To all the people

@skillsmatter
who made us who we were, I’m very sad and sorry that

@skillsmatter
has gone in administration. A massive thanks to my beautiful team, who contributed so much beauty, passion, talent and experience, words cannot express my gratitude."

Skills Matter were the hosts of the very first IT meetup that I attended when I moved to London over eleven years ago, and also the hosts of the most recent meetup event that I attended just a few weeks ago.
Over the years they have moved from a quirky building in Clerkenwell to a more conventional setup near the Barbican, and finally into a large purpose-built central city facility near Moorgate.

Here is a link to the page on LinkedIn outlining the circumstances that led to this administriation situation:
https://www.linkedin.com/pulse/skills-matter-appointed-administrators-wendy-devolder/

I really hope that someone was able to step forward to keep Skills Matter going before the deadline.

Monday, 28 October 2019

New blog dedicated to microservices

I liked my previous post's title so much that I have decided to spin up a new blog dedicated to my experiences and observations in the development and operation of microservices.

The title is, "New adventures in microservives," and you can find it at the following URL:

https://ms-blog.elegant-solutions.london/

You'll have to excuse me if I don't go into specifics about where or when I have encountered the issues that I will post about. There are not horror stories or embarassments that I can recall, it's just professional courtesy not to disclose irrelevant details.

Friday, 25 October 2019

New adventures in microservices - reducing inter-service calls

Not so long ago I was working on a product that internally exposed an API to allow clients to keep in sync with a user's most recently read documents. I expect most readers of this blog will have used Amazon's Kindle or a similar online reading application so I won't have to explain any of the fundamentals of this functionality.

On the surface this "recently read" service was quite simple - just read the user's most recent records out of a specifically designed database table and present it to the client application.

A complicating factor in this particular system was that some documents belong to a group rather than an individual user, and as a result the rights to access documents could change over time.

I consider this to be a case study of when catering for an edge case can lead to unnecessary pressure on core systems. Approximately 75% of calls to the documents service were permission checks from the recently read service.

All of the clients of the recently read service would silently ignore any reference to a document that the client did not have, so the permission checks were completely unnecessary.

After consulting with the various teams involved I removed the permission checking calls and as a result the response time of the recently read service improved, and the load on the documents service reduced significantly. As a result the documents service was able to run with fewer instances.

This was one of the rare cases when the best way to improve the performance of a service call was to remove it altogether.

Wednesday, 23 October 2019

Recent Holiday Reading

Travelling to the other side of the world offers plenty of idle time in the airport shops. Turning up to the airport three hours before the flight is a given, when considering the hassle and cost involved if the first flight is missed.

Even if I have a book or two in my carry on bag I still peruse the paperbacks shelves at WH Smith or the local equivalent airport retailer.

So, here's a list of books that I have been reading during my latest holiday...

Technology books (on Kindle - since typical bookshops don't have these titles)

Site Reliability Engineering - How Google Runs Production Systems

Kubernetes Patterns - Reusable Elements For Designing Cloud Native Applications

Accelerate - Building and Scaling High Performing Technology Organizations

Not so technology books

Eric Idle - Always Look On The Bright Side Of Life

Permanent Record (Edward Snowden)

Interview preparation

I had a mental blank while attempting to jot down some code for solving a problem on paper during a recent job interview. Less than a minute after leaving the interview I remembered the name of a method that I should have been using instead of my clunky array manipulation. I put the experience down to a combination of being so reliant on IDE code completion, and not having written much code for a couple of months while I have been unwinding on holiday.

Alas, even my depth of knowledge on changes to HashMap’s implementation (List gets replaced by tree when 8 elements fall under the same bucket since Java 8) did not balance out for that particular glitch.

Oh well, it was always going to be unlikely to ace the first interview after a four year break.

Preparing for interviews versus skating to where the puck is going to be

"I skate to where the puck is going to be, not to where it has been." - Wayne Gretsky

I'm in the process of seeking out a new challenge in my career (no need to whisper - I've left my previous job on good terms and taken a nice long holiday - so I'm not sneaking around to speak to recruiters during work time).

While updating my CV I have been quite aware that once again I find myself without professional experience of the latest popular technology in my sector - Kubernetes ... and maybe Kafka.

It's very tempting to go away and complete a full-on course to fill in the gaps, but I've been in the industry for long enough to appreciate that there's still a reasonable chance that my next role might not include those technologies anyway.

So, for now I'll just have to find the right balance of revising what I already know, and reading about and watching videos about enough to be able to carry my end of a conversation.

Tuesday, 22 October 2019

How NOT to use LinkedIn

I've recently taken the decision to look for a new job. Much like most of my other career moves I have chosen to leave a comfortable position, treating looking for the next one as a full time endeavour - rather than being sneaky and booking a "dentist appointment" or taking time off to attend interviews.

In this modern era I thought that I would not need to update my CV, as LinkedIn is the go to place for publicising that I am available, and I have kept my profile there relatively up to date and complete.

My main discovery of the last few days has been that the "Projects" section in a LinkedIn profile is not very prominent. For example, if I choose to export my profile as a PDF document then none of the project information will be included.

The significance of this issue was reinforced when I attempted to export my LinkedIn profile across to a third party system as part of applying for a job. Sure enough none of the project information was carried across.

With this in mind, I will be restructuring my profile so that the key aspects of my project experiences will also be mentioned in the high level description section for each job that I have held. This should prevent the unpleasant experience of having to fill in a lot of gaps in face to face interviews where interviewers have only seen the top level of my LinkedIn profile as that has been provided to them by their recruiter.

I've also seen this situation as a reason to finally get around to buying and installing Word on my Mac, rather than cranking up my 11 year old Windows laptop for updating a CV to circulate.

Monday, 21 October 2019

Contributions to open source - it's not just about hardcore coding

Just a listing of some contributions that I have made to open source software, from creating my own code to making a third party's documentation a little bit more readable.

Created plugin for GoCD continuous integration server to enable polling of status of application in Cloud Foundry.
https://github.com/Sounie/springer-gocd-cloudfoundry-plugin

- Identified code change in Apache Camel that resulted in messages being deleted from an AWS SQS queue even when the application logic encountered an error path.

At the time we had a situation where a regular trickle of events would normally fail to process, resulting in retries and ultimately being automatically moved onto a dead letter queue.

When the DLQ stopped receiving messages it took a while to trace back what had changed. As an aside, this type of situation can be considered as a good motivation for making small distinct changes - this is where a continuous deployment pipeline is a real enabler.
CAMEL-9405 - Amazon SQS message deletion behaviour change on exception

- vavr (formerly known as javaslang).
Some wording tweaks

- Jenkins CI
Bugfix in translation perl script

- AdoptOpenJDK Docker image scripts
Correction to comment

- IntelliJ IDEA Findbugs plugin
Typo in label shown in IDE

Sunday, 20 October 2019

Google Search Console

A couple of years ago I realised that website owners can obtain access to information about how users on Google end up reaching their site via Google. Now I'm getting around to setting that up for this blog site.

I'm not expecting any high numbers of visitors, but am a little bit curious to see which page of search results my content show up on - and what sort of terms users are entering to reach here.

Friday, 11 October 2019

Takeaways from JAX London 2019

I attended the JAX London "The Conference for Java and Software Innovation" earlier this week. It was a great opportunity to keep up to date with what is happening with the core technologies that I have been using in my day to day work for most of my career so far. This post is a brief summary of some of my favourite tidbits.

In serverless computing size matters - smaller containers and apps mean less time is required for data transfer, and fewer class files mean less time for class loading which all feeds into how long it takes for the environment to be ready to execute.

Later releases of Java no longer have the concept of separate smaller JRE. If you want to deploy applications without the full JDK then jlink can be used to build a custom runtime image which only includes the modules that your app requires.

Efficiency improvements at scale can have environmental benefits - requiring fewer servers to perform the same work means less electricity is consumed.

It's okay to not know about every technology out there. Just as I was contemplating building up a mindmap of major technologies and the main current implementations, the presenter up front described how many different aspects there now are to software development - development tools, languages, deployment containers, continuous integration systems, service meshes, content delivery networks... - that's just some of the back end, I can't imagine anyone keeping a straight face when claiming to be a full stack developer and keeping up to date to the same extent.

Sometimes incrememental improvements will be the best path to improving an existing system. Measure what it is currently doing, adjust something that looks like it is having an impact on the key performance indicator then measure again - rinse and repeat.

Other times it's best to throw away and start again - e.g. garbage collection settings when upgrading JDK version. The settings that made sense on Java 8 may not be necessary or optimal in Java 11. This will really be the case if you're using CMS as that is not expected to even exist in later versions of the JDK.

JVM startup time regressed from Java 8 to Java 9, but has improved in subsequent releases. That might explain why AWS's Lambda implementation didn't move forward from Java 8 to Java 9 (not being a long term support version would also be a factor).

Tuesday, 18 June 2019

Troubleshooting AWS lambda too many open files

Some time ago - in the not so distant past - I was offered the opportunity to assist a colleague with troubleshooting a system that was facing some performance issues.

Problem 1: Dedicated caching server running out of memory and constantly swapping

This cache had been put in place to reduce the need to call out to other services for data that might be needed several times in a given time window.

The developers who had set up this system had moved on to other teams or companies, so we didn't have much context to go by as to whether there was any pattern to the distribution of requests for the data that could have hinted at a sensible expiry policy.

Solution 1: Get a bigger ~~boat~~ cache

Left to rely on the hit rate metric for the cache to tell us whether or not it is actually fit for purpose, we decided to replace the caching server with the next larger instance size - and also to specify the recommended parameters for reserved memory which seemed to have been missed in the existing setup.

With the new cache in place swapping did not return as an issue. However, errors were still showing up in the logs for the lambda that was utilising the cache. Experience says that we should never leave a job half-done.

Problem 2: Lambda running out of available file handles / sockets

The logs were showing a range of different types of errors that may or may not have been related:

Unable to perform a DNS lookup
Unable to load a class
Unable to connect to the cache

One of the error messages provided a vital hint: "too many open files".

This was my first experience of properly getting my hands dirty with AWS lambdas, so it was interesting to learn all about the way that it can keep an instance of the lambda around for much longer than the duration of the execution of a single call.

Our lambda was being called several times per minute which was enough to keep some instances of it around for long enough to run out of resources.

A bit of reading of the documentation revealed that we should be able to utilise over a thousand file handles / sockets in our lambda - which should be plenty.

Diagnosing the issue: Measuring open file handles at runtime

I had a theory at this point that the "Unable to load a class" issue may have been triggering the other issues, so I introduced some diagnostic logging that would output the total number of open file descriptors on each invocation of the lambda. Once this was in place it appeared that around three more files / sockets were open after each invocation - so the leak had nothing to do with the occasional class loading failure.

The next revelation came when I noticed that the lambda would load in a config file for each invocation, but was not showing any sign of an attempt to close that file afterwards.

Tidying up just that particular resource handling didn't make the problem go away.

Solution 2: Properly clean up resources

After some more digging around I realised that the caching client had its own connection pool that is set up as part of each call, but not shut down afterwards. This wasn't trivial to change, as the setup for the caching client was nested within a few layers of Builder classes, but some refactoring enabled us to hook this into the lifecycle of the lambda invocation.

Monitoring of the logs showed that the open files count was now stable.

Conclusion

Lambdas are just like any other code we write. We need to pay attention to the lifecycle and ensure that resources are cleaned up when they have finished being used.

Next steps

Cache entry expiry and cache right-sizing

At the time of writing this post the new, larger cache is growing steadily and showing no sign of steadying off. So, there is no reason to expect that the swapping issue may not return.

The current cache logic is lacking a default expiration for entries that become stale. For some of the entries involved there is no reason to expect that the values being encountered will be re-used very often (some might not be encountered more than once a week / month / name your favourite time unit).

Cache connection pooling optimisation
If AWS Lambdas supported shutdown hooks or some other mechanism for detecting when the instance is being abandoned then we could update the lambda to set up the cache connection pool at initialisation and the corresponding closing on shutdown - not today.

Thursday, 21 March 2019

The names have changed, the mistakes are the same

I decided not to stay on the monthly IT department catch up call this afternoon. Luckily I had plenty to keep me occupied a few minutes later.

> "Hey, does anyone know why the site's down?"
< "It's not down", "Ooh, it's slow, what's that about"
....
Long story short, somebody updated a security group config so one of the apps couldn't reach its cache. A few years ago it would have been a firewall, or iptables, or ipchains now we're in the cloud.

Before we could get around to identifying and fixing that we had a bigger problem - users started to see server errors instead of just slow responses and timeouts.

Second long story short - there was an expired SSL certificate on another back end service.

This is 2019, and we're still making the same types of mistake as I used to see in 1990s.