Monday, June 2, 2008

Detecting Intermittent Build Failure with Timed Builds

Almost every project I've been on eventually reaches a point where one or more tests are failing sporadically. Usually, this indicates there's a problem with the test, such as relying on timing, but occasionally it's a problem with the unit under test, or with the testability. For instance, integration testing of concurrency in a web-application is difficult, as you cannot control the timing of the threads inside the application server.

These problems often make themselves known in subtle ways by failing on occasion. If the developers working on the project aren't proactive, it's not uncommon to reach a point where any failed build on "known broken" tests are simply run again, which can consume a lot of potential productivity.

In order to find these kinds of intermittent failures, we often run timed builds in the off-hours. In the periods we're unlikely to be developing (at night and on weekends), we have a build trigger set to run project builds repeatedly (in Bamboo, this is a scheduled build with a cron expression like: 0 0/20 0-7,19-23 ? * *). When we come back the next day or after a weekend, we have a large number of build results waiting, and if we have an intermittent failure, it's likely to have come up at least once, often more than once.

This does take a little effort, but it's a step towards promoting the overall health of the build.

No comments: