Monday, March 5, 2007

Profiling a Build

I've been working on a product for the past few years. When we started, the build was a sub-minute job, but by the time of the first release, it had made its way up to five minutes, mostly in tests.

At the end of the second release, it was twenty minutes, again, mostly tests, although a UI smoke-test and some additional database tooling had been added. That was longer than I'd gotten used to, but livable, if need be.

Now, four or so more releases, we're at forty minutes. With our Continuous Integration server, developers can check in without running the entire cross-project build, as long as they take reasonable steps to ensure that the checkin is not likely to break the build. It has become clear that the "total build" has increased in time, and will continue to do so, unless we take measures to reduce it.

What Can We Do?
Although this is a challenge our project faces, our project is not unique. As a project matures, we add to it, in tests, tools, project structure. These additions cost time in the build process, and that time can hinder the productivity of the team.

As with any kind of performance improvements, it's best to start by measuring the performance and determining bottlenecks: the problem may not be where you think. This means: profiling your project's build.

Since our project is not unique, you'd expect to find information on profiling the build of a project. Surprisingly, I haven't, in six months of occasional queries, found much to satisfy my interest in this subject-matter. There seems to be a dearth of information on profiling builds.

Most build tools don't have any kind of built-in profiling capabilities. You could use a Java profiler, but these tend lack the context-sensitivity to be truly valuable, particularly when you're dealing with infrastructure that tends to repeat.

Build Tools
I'm surprised that Build tools (sucks-rocks.com) don't seem to come with some basic profiling capabilities. Particularly with Maven (1/2), which have a standard build process. Maven could easily tell me things like: the time it takes to compile vs. test vs. create the distributable vs. deploy to the repo; what percentage of the multi-project build each project takes. This kind of technology would not be difficult to build, so I'm surprised that no-one has gotten around to doing it. (And, no, I don't plan on spending the required time in Jelly myself. Jelly is evil.)

With a large multi-project build, it'd be really great to have some sense of where your time is being spent. Does your build tool tell you this kind of information?

Profilers
Profilers are generic, so they tend to lack the context that makes it easy to interpret the results. That's not to say that you can't run your entire build through a profiler, just that the results will require more work to gather and interpret than a simple build-timing report would.

Some profilers attempt to look for context, to give you information about web requests, database transactions and queries, etc. Some even try and extend that context-finding control to you, the developer (see the interceptor API). But none of them, that I'm aware of, come with any sense of a build context out of the box, which is a shame.

Other Approaches and Closing the Feedback Loop
So what do you do? How do you profile the build in your project? Have you found a way to do this that's more effective than anything I've considered here?

No comments: