Wednesday, May 2, 2007

Forensic Code Reviews

Several times within the last year, I've conducted what amounts to a forensic code review: in order to find out how a portion of the codebase ended up in a particular state, I've had to delve through its history and gather evidence.

When you do you first one or two forensic code reviews, it's fascinating to see how much you really can dig up, if you try. The information's all there, if you're using a good source control system. You are using source control, aren't you?

Unfortunately, if you're using CVS, as I imagine many of you are, and you're refactoring your code, as I hope many of you are, it helps to have a pretty good understanding of how the codebase has changed. When conducting a forensic code review, I often have to go back beyond the last file location, which means I need to know where the code used to be -- particularly when the first comment doesn't make that clear. (Note: you can save yourself some pain here; use a good commit comment when moving or renaming files).

The Current Example
We have a work product that requires some heavy calculation to achieve. As an example, I'll use an insurance quotation that takes the customer's desired insurance needs, profile, autos, runs them through a proprietary set of business rules and creates a quotation. This isn't the domain of our system, but it's one that most people are familiar with, so it's an easier example.

Although the insurance quote is entirely derived information, the quotation process takes a fair amount of time to run, so we store the end result in the database. If the customer wishes to see the quote again, we simply retrieve it.

Before we retrieve it, though, we run a quick check to see if the quotation is valid. We check the date the quotation was calculated against the last date on which inputs to the quotation have changed. If the quotation was calculated after the last input change, then it's "clean", and we can offer it to the customer direct from the database. If the inputs have changed since the quotation was calculated, then the quotation is 'dirty', and we have to calculate a new one.

Before I get into the problem at hand, let me describe one other date: the calculation-effective-date. This is the date for which the calculation should be effective. There are a number of date-effective routines in the system that look up competitive data, choose versions of the business logic, check against date-sensitive actuarial tables, that sort of thing. In order to ensure that a customer doesn't experience an unexpected change in rates between one "what if" quotation and another, we use the beginning of the insurance quotation session as the calculation effective date. This means that a customer won't hit "recalc" and be presented with radically different results for almost-identical data.

The Problem
Recently, we've seen that the quotation process seems to be invoked more often than it should be. In the process of making another change, I discovered why.

Where we used to record the current date as the date on which the quotation was calculated, we seem to have started using the calculation effective date. Since this calculation effective date is the beginning of the quotation session, this is usually before the last input data changed. The system thinks the quotation is always dirty, and thus calculates a new quotation each time we ask to see it.

The Causes
There are two independent things that caused this problem. At a point in the not-too-distant past, we used the 'current date' at the time of calculation for calculation-effective routines. This meant that one quotation was internally consistent, but it might not be consistent with the last quotation the customer received, minutes ago.

The code that invoked the quotation engine created a date variable, calculationEffectiveDate, then went on to use that both as the calculation effective date, and as the date on which the solution was calculated. This worked for a long time, because the two were effectively the same.

However, when we moved to using the session start, another developer came along and revised the assignment to calculationEffectiveDate so that the session's start was used for this. Unfortunately, this variable, and thus date, was, as described above, also used as the date on which the solution as calculated. This caused the caching failures we've already described.

There are a few conclusions here:

  1. Name your variables for how they are used; if a variable has two uses, make sure that's clear in the name, or use two variables.
  2. When modifying existing code, make sure you understand the impact. In this instance, judicious use of Eclipse's "Mark Occurrences" would have quickly shown the variable whose assignment was being modified was being used in two distinct ways.
  3. Don't use source control; there's always some bastard like me waiting to dig up your mistakes later, and blame you for them. :)


Anonymous said...

I'd fire the bastard for that.


Anonymous said...
This comment has been removed by a blog administrator.