This blog entry provides an overview of my recent experience of developing a software system which keeps its core data fresh and ready to be presented to users.
It is a work in progress, so may end up spanning a few posts.
What do I mean by an auto-refreshing cache?
A cache which has its content pre-loaded and refreshed automatically without user action.
Why do I want an auto-refreshing cache?
The desired time for displaying content on screen is less than the time required to fetch the data from the data source across the network - by orders of magnitude.
The remote data source and network connection are beyond our control.
There's more than one way data can change over time
There are two primary ways in which the data in this system changes over time, which should be reflected in the state of the cache:
- Data that should no longer be displayed because it is no longer relevant
- Data properties that change over time
The data to be cached includes some date and time properties which can be used as a basis for removal from the cache.
Due to the diverse nature of the data we have multiple caches. In some caches an entity has its own entry keyed by a unique identifier, while in other caches multiple entities are grouped together by a key generated from the criteria used for the data source lookup (e.g. date and group id).
For the grouped data, the approach for removing expired data involves:
- iterating over the cache entries
- checking that the overall cache entry is not due to expire
- obtaining a write lock on the cache entry
- iterating over the entities contained in the cached data structure and removing those that are expired
- putting the updated data back into the cache
- releasing the write lock on the cache entry
Automating updates of existing entries
For data which is already held in the cache and is not ready to be removed, we can re-fetch the data from the data source and write it into the cache.