Late last week I found myself in a mini spiral of database updates for a reporting system - removing an outer join in the morning, only to add another outer join in the afternoon.
Sure enough, this week I am taking a look at the bigger picture - where is the data coming from, and why is it so ssslllloooowwwww to come out. Late yesterday afternoon / early evening I looked into how the ETL tables are being populated and realised that the data in the new table that I have started joining against could easily be included in the denormalised structure that we have set up especially for reporting. I also came to appreciate why some other data had been included a couple of weeks back (before I had the bright idea of excluding it and adding a join to a later query).
I'm fairly certain that the changes I have applied today will improve performance considerably, and should also make the data access code more readable for future reporting requirements. I was unsure whether it would make sense to split out the ETL into 2 tables, as that could be considered as normalisation - but in this case it made sense, as the queries will be simpler because they will not have to filter out duplicate rows for the data that is common for a collection of entities that are related for a set period of time.
The "moving target" aspect of this exercise came in the form of some database structure updates being added by a colleague at around the same time. Essentially it has meant that I have no historic data to verify that the most complex report generated from this data is still working. In theory that should be a trivial obstacle to overcome, but it's the sort of surprise that can turn a morning's work into a couple of days' work if you're not careful.
No comments:
Post a Comment