Hammerhead: moving performance testing upstream

Today at The Ajax Experience, I released Hammerhead, a Firebug extension for measuring page load times.

Improving performance starts with metrics. How long does it take for the page to load? Seems like a simple question to answer, but gathering accurate measurements can be a challenge. In my experience, performance metrics exist at four stages along the development process.

  • real user data – I love real user metrics. JavaScript frameworks like Jiffy measure page load times from real traffic. When your site is used by a large, diverse audience, data from real page views is ground-truth.
  • bucket testing – When you’re getting ready to push a new release, if you’re lucky you can do bucket testing to gather performance metrics. You release the new code to a subset of users while maintaining another set of users on the old code (the “control”). If you sample your user population correctly and gather enough data, comparing the before and after timing information gives you a preview of the latency impact of your next release.
  • synthetic or simulated testing – In some situations, it’s not possible to gather real user data. You might not have the infrastructure to do bucket testing and real user instrumentation. Your new build isn’t ready for release, but you still want to gauge where you are with regard to performance. Or perhaps you’re measuring your competitors’ performance. In these situations, the typical solution is to do scripted testing on some machine in your computer lab, or perhaps through a service like Keynote or Gomez.
  • dev box – The first place performance testing happens (or should happen) is on the developer’s box. As she finishes her code changes, the developer can see if she made things better or worse. What was the impact of that JavaScript rewrite? What happens if I add another stylesheet, or split my images across two domains?

Performance metrics get less precise as you move from real user data to dev box testing, as shown in Figure 1. That’s because, as you move away from real user data, biases are introduced. For bucket testing, the challenge is selecting users in an unbiased way. For synthetic testing, you need to choose scenarios and test accounts that are representative of real users. Other variables of your real user audience are difficult or impossible to simulate: bandwidth speed, CPU power, OS, browser, geographic location, etc. Attempting to simulate real users in your synthetic testing is a slippery, and costly, slope. Finally, testing on the dev box usually involves one scenario on a CPU that is more powerful than the typical user, and an Internet connection that is 10-100 times faster.

Figure 1 - Precision and ability to iterate along the development process

Given this loss of precision, why would we bother with anything other than real user data? The reason is speed of development. Dev box data can be gathered within hours of a code change, whereas it can take days to gather synthetic data, weeks to do bucket testing, and a month or more to release the code and have real user data. If you wait for real user data to see the impact of your changes, it can take a year to iterate on a short list of performance improvements. To quickly identify the most important performance improvements and their optimal implementation, it’s important to improve our ability to gather performance metrics earlier in the development process: on the dev box.

As a developer, it can be painful to measure the impact of a code change on your dev box. Getting an accurate time measurement is the easy part; you can use YSlow, Fasterfox, or an alert dialog. But then you have to load the page multiple times. The most painful part is transcribing the load times into Excel. Were all the measurements done with an empty cache or a primed cache, or was that even considered?

Hammerhead makes it easier to gather performance metrics on your dev box. Figure 2 shows the results of hammering a few news web sites with Hammerhead. By virtue of being a Firebug extension, Hammerhead is available in a platform that web developers are familiar with. To setup a Hammerhead test, one or more URLs are added to the list, and the “# of loads” is specified. Once started, Hammerhead loads each URL the specified number of times.

Figure 2 - Hammerhead results for several news sites

Figure 2 - Hammerhead results for a few news web sites

The next two things aren’t rocket science, but they make a big difference. First, there are two columns of results, one for empty cache and one for primed cache. Hammerhead automatically clears the disk and memory cache, or just the memory cache, in between each page load to gather metrics for both of these scenarios. Second, Hammerhead displays the median and average time measurement. Additionally, you can export the data in CSV format.

Even if you’re not hammering a site, other features make Hammerhead a useful add-on. The Cache & Time panel, shown in Figure 3, shows the current URL’s load time. It also contains buttons to clear the disk and memory cache, or just the memory cache. It has another feature that I haven’t seen anywhere else. You can choose to have Hammerhead clear these caches after every page view. This is a nice feature for me when I’m loading the same page again and again to see it’s performance in an empty or a primed cache state. If you forget to switch this back, it gets reset automatically next time you restart Firefox.

Figure 3 - Cache & Time panel in Hammerhead

Figure 3 - Cache & Time panel in Hammerhead

If you don’t have Hammerhead open, you can still see the load time in the status bar. Right clicking the Hammerhead icon gives you access for clearing the cache. The ability to clear just the memory cache is another valuable feature I haven’t seen elsewhere. I feel this is the best way to simulate the primed cache scenario, where the user has been to your site recently, but not during the current browser session.

Hammerhead makes it easier to gather performance metrics early in the development process, resulting in a faster development cycle. The biggest bias is that most developers have a much faster bandwidth connection than the typical user. Easy-to-install bandwidth throttlers are a solution. Steve Lamm blogged on Google Code about how Hammerhead can be used with bandwidth throttlers on the dev box, bringing together both ease and greater precision of performance measurements. GIve it a spin and let me know what you think.