The Art of Capacity Planning
This week I opened a beautiful package from O’Reilly. It contained John Allspaw’s new book, The Art of Capacity Planning. As you can see, the cover is a delight to look at. But you shouldn’t judge a book by it’s cover! Luckily, what’s inside the book is also a delight.
Now, let me make a disclaimer. I know John. We first crossed paths at Yahoo!, and have worked together on some side projects, most notably the Velocity conference. I know he’s an expert in the area of capacity planning. I know he’s highly regarded by other leaders in the operations space. I’ve heard him speak at conferences, brainstorm in small group discussions, and share his experiences with others. Given this background, I expected a book full of lessons learned, practical advice, and real world takeaways. Thankfully, for all of us readers, The Art of Capacity Planning delivers all of this and more.
Right out of the gate, John covers a topic near and dear to my heart: metrics. His advice? “Measure, measure, measure.” John reinforces this by including an incredible number of charts throughout the book. He goes on to say that our measurement tools need to provide an easy way to:
- Record and store data over time
- Build custom metrics
- Compare metrics from various sources
- Import and export metrics
As I read the book, I found myself nodding and thinking, “yes, yes, this is exactly what I learned!” Although it’s been more than five years since I was buildmaster for My Yahoo!, I really resonated with the advice John provides, like this one: “Homogenize hardware to halt headaches”. (You have to love the alliteration, too.)
In a thin book that’s easy to read, John covers a large number of topics. He talks about load testing, with pointers to tools like Httperf and Siege. There are several sections that talk about caching architectures and the use of Squid. He provides guidelines when it comes to deployment, such as making all changes happen in one place, the importance of defining roles and services, and ensuring new servers start working automatically. At the end he even manages to cover virtualization and cloud computing, and how they come into play during capacity planning.
The Art of Capacity Planning is full of sage advice from a seasoned veteran, like this one: “The moral of this little story? When faced with the question of capacity, try to ignore those urges to make existing gear faster, and focus instead on the topic at hand: finding out what you need, and when.” When I read a technical book, I’m really looking for takeaways. That’s why I loved The Art of Capacity Planning, and I think you will, too.
Runtime Page Optimizer
The Runtime Page Optimizer (RPO) is an exciting product from Aptimize. RPO runs on a web server applying performance optimizations to pages at runtime, just before the page is sent to the browser. RPO automatically implements many of the best practices from my book and YSlow, so the guys from Aptimize contacted me and showed me an early version. Here are the performance improvements RPO delivers:
- minifies, combines and compresses JavaScript files
- minifies, combines and compresses stylesheets
- combines images into CSS sprites
- inlines images inside the stylesheet
- turns on gzip compression
- sets far future Expires headers
- loads scripts asynchronously
RPO reduces the number of HTTP requests as well as reducing the amount of data that is transmitted, resulting in a page that loads faster. In doing this the big question is, how much overhead does this add at runtime? RPO caches the resources it generates (combined scripts, combined stylesheets, sprites). The primary realtime cost is changing the HTML markup. Static pages, after they are massaged, are also cached. Dynamic HTML can be optimized without a significant slowdown, much less than what’s gained by adding these performance benefits.
RPO is available for SharePoint, ASP.NET and DotNetNuke sites, and they’re working on a version for Apache. This is an exciting step in the world of web performance, shifting the burden of optimization from the web developer to making pages fast by default. They’ve just released RPO, but they’re deploying to at least one global top 1000 web site, so stats should be available soon. I can’t wait to see!
Say “no” to IE6
IE6 is a pain. It’s slow. It doesn’t behave well. Things that work in other browsers break in IE6. Hours and hours of web developer time is spent just making things work in IE6. Why do web developers spend so much time just to make IE6 work? Because a large percentage of users (22%, 25%, 28%) are still on IE6.
Why are so many people still using IE6?! IE7 has been out for two years now. IE8 is a great improvement over IE7, but I don’t think people delayed installing IE7 because they were waiting for a better browser. There’s some other dynamic at play here. Most people say it’s employees at companies where IT mandates IE6. That’s true. But there are also other opportunities to upgrade from IE6, outside of these IT-controlled environments.
I just came back from The Ajax Experience where developers were talking about the trials and tribulations of getting their web apps to work on IE6, but there were no discussions about how to put an end to this. I remembered someone had mentioned SaveTheDevelopers.org and their “say no to IE6” campaign. I checked it out and added it to my front page and this blog post. If you’re not on IE6, click here to see how it works.
Let’s all start promoting a program like this. Not only will it encourage individuals to upgrade, but it will also apply pressure to those reluctant IT groups. If the code from SaveTheDevelopers.org isn’t right for your site, by all means, code up a different message. We need to start encouraging users to upgrade to newer browsers so they can enjoy a better browsing experience. And sure, maybe we can get a few more hours of sleep, too.
Firefox 3.1: raising the bar
Last month I released UA Profiler – tests that automatically measure the performance characteristics of browsers. The thing I really like about this tool is that the web community generates the data. Anyone can point their browser at the site and run the tests. So far 1000+ people have run 2000+ tests on 100 different browsers.
One of the things that UA Profiler revealed was that a regression occurred between Firefox 2 and Firefox 3: Firefox 3 no longer cached redirects. I just found out that in the latest Firefox 3.1 dev builds, called “Minefield 3.1” in its User-Agent string, this regression has been fixed. As a result, the UA Profiler results now show that Firefox (Minefield) 3.1 has taken the lead capturing 9 out of the 11 performance traits measured. Chrome, Firefox 3.0, and Safari 4 are tied for second with 8 out of 11. Internet Exporer 8, Firefox 2, and Safari 3.1 all have 7 out of 11. Opera, IE6, and IE7 have 5 or fewer of these important traits.
I expect Firefox 3.1 and Chrome to soon come out with parallel script loading. If Chrome adds support for prefetch links, they’ll be tied with Firefox 3.1. Internet Explorer 8 has ground to make up, but I do credit them with being the first in the pack to support parallel script downloads. It’s great to see browser vendors raising the bar when it comes to performance.
Hammerhead: moving performance testing upstream
Today at The Ajax Experience, I released Hammerhead, a Firebug extension for measuring page load times.
Improving performance starts with metrics. How long does it take for the page to load? Seems like a simple question to answer, but gathering accurate measurements can be a challenge. In my experience, performance metrics exist at four stages along the development process.
- real user data – I love real user metrics. JavaScript frameworks like Jiffy measure page load times from real traffic. When your site is used by a large, diverse audience, data from real page views is ground-truth.
- bucket testing – When you’re getting ready to push a new release, if you’re lucky you can do bucket testing to gather performance metrics. You release the new code to a subset of users while maintaining another set of users on the old code (the “control”). If you sample your user population correctly and gather enough data, comparing the before and after timing information gives you a preview of the latency impact of your next release.
- synthetic or simulated testing – In some situations, it’s not possible to gather real user data. You might not have the infrastructure to do bucket testing and real user instrumentation. Your new build isn’t ready for release, but you still want to gauge where you are with regard to performance. Or perhaps you’re measuring your competitors’ performance. In these situations, the typical solution is to do scripted testing on some machine in your computer lab, or perhaps through a service like Keynote or Gomez.
- dev box – The first place performance testing happens (or should happen) is on the developer’s box. As she finishes her code changes, the developer can see if she made things better or worse. What was the impact of that JavaScript rewrite? What happens if I add another stylesheet, or split my images across two domains?
Performance metrics get less precise as you move from real user data to dev box testing, as shown in Figure 1. That’s because, as you move away from real user data, biases are introduced. For bucket testing, the challenge is selecting users in an unbiased way. For synthetic testing, you need to choose scenarios and test accounts that are representative of real users. Other variables of your real user audience are difficult or impossible to simulate: bandwidth speed, CPU power, OS, browser, geographic location, etc. Attempting to simulate real users in your synthetic testing is a slippery, and costly, slope. Finally, testing on the dev box usually involves one scenario on a CPU that is more powerful than the typical user, and an Internet connection that is 10-100 times faster.

Figure 1 - Precision and ability to iterate along the development process
Given this loss of precision, why would we bother with anything other than real user data? The reason is speed of development. Dev box data can be gathered within hours of a code change, whereas it can take days to gather synthetic data, weeks to do bucket testing, and a month or more to release the code and have real user data. If you wait for real user data to see the impact of your changes, it can take a year to iterate on a short list of performance improvements. To quickly identify the most important performance improvements and their optimal implementation, it’s important to improve our ability to gather performance metrics earlier in the development process: on the dev box.
As a developer, it can be painful to measure the impact of a code change on your dev box. Getting an accurate time measurement is the easy part; you can use YSlow, Fasterfox, or an alert dialog. But then you have to load the page multiple times. The most painful part is transcribing the load times into Excel. Were all the measurements done with an empty cache or a primed cache, or was that even considered?
Hammerhead makes it easier to gather performance metrics on your dev box. Figure 2 shows the results of hammering a few news web sites with Hammerhead. By virtue of being a Firebug extension, Hammerhead is available in a platform that web developers are familiar with. To setup a Hammerhead test, one or more URLs are added to the list, and the “# of loads” is specified. Once started, Hammerhead loads each URL the specified number of times.

Figure 2 - Hammerhead results for a few news web sites
The next two things aren’t rocket science, but they make a big difference. First, there are two columns of results, one for empty cache and one for primed cache. Hammerhead automatically clears the disk and memory cache, or just the memory cache, in between each page load to gather metrics for both of these scenarios. Second, Hammerhead displays the median and average time measurement. Additionally, you can export the data in CSV format.
Even if you’re not hammering a site, other features make Hammerhead a useful add-on. The Cache & Time panel, shown in Figure 3, shows the current URL’s load time. It also contains buttons to clear the disk and memory cache, or just the memory cache. It has another feature that I haven’t seen anywhere else. You can choose to have Hammerhead clear these caches after every page view. This is a nice feature for me when I’m loading the same page again and again to see it’s performance in an empty or a primed cache state. If you forget to switch this back, it gets reset automatically next time you restart Firefox.

Figure 3 - Cache & Time panel in Hammerhead

If you don’t have Hammerhead open, you can still see the load time in the status bar. Right clicking the Hammerhead icon gives you access for clearing the cache. The ability to clear just the memory cache is another valuable feature I haven’t seen elsewhere. I feel this is the best way to simulate the primed cache scenario, where the user has been to your site recently, but not during the current browser session.
Hammerhead makes it easier to gather performance metrics early in the development process, resulting in a faster development cycle. The biggest bias is that most developers have a much faster bandwidth connection than the typical user. Easy-to-install bandwidth throttlers are a solution. Steve Lamm blogged on Google Code about how Hammerhead can be used with bandwidth throttlers on the dev box, bringing together both ease and greater precision of performance measurements. GIve it a spin and let me know what you think.
Ugly Doll at the Ajax Experience
My six year old daughter snuck her purple hippo in my bag when I went to SXSW in February. It provided a way for us to stay in touch even with time zone differences and busy schedules. I would take pictures of the purple hippo with my iPhone and email them back home. She and the rest of the family could see where the purple hippo was at the conference, and it was likely I was there, too. The photos made for a fun blog post: Purple Hippo at SXSW.
When I arrived at The Ajax Experience, I found her Ugly Doll at the bottom of my suitcase. I smiled. Let the games begin.
First things first, Ugly Doll checked to see if there was free wifi in the room. Answer? Negative. $12.95 later and we started catching up on email. It was late so Ugly Doll checked out the bed. Verdict? Thumbs up.
![]() |
![]() |
The first day was great – I attended talks from beginning to end (rare for me). Ben and Dion started with an intro Ajax tutorial – nice way to ease into the morning. John Resig followed with a comparison of jQuery, Dojo, MooTools, Prototype, and YUI – an even-handed comparison useful for anyone starting the selection process. Ben and Dion had an informative and fun romp through the Ajax Universe. This was the best talk of the day for me. They spiced it up by switching speakers based on a timer that went off at random intervals every 10 seconds to 2 minutes. Ojan Vafai presented Google Chrome. The day ended with PPK leading a panel comprised of representatives from the major JavaScript libraries: John Resig (jQuery), Dylan Schiemann (Dojo), Christian Heilmann (YUI), and Andrew Dupont (Prototype).
![]() |
![]() |
![]() |
Dion Almaer and Ben Galbraith, Christian Heilmann, John Resig. |
The highlight of the second day were the lightning talks. Stoyan Stefanov and Nicole Sullivan announced Smush it – an image optimization tool that looks extremely useful. I did a talk about Episodes – a proposal for measuring performance in Web 2.0 apps. Bill Scott talked about the upcoming release of the Netflix API, and Gavin Doughtie talked about interviewing JavaScript gurus. Brendan Eich delivered a keynote on JavaScript speed, and PPK had a good presentation on browser incompatability patterns.
![]() |
![]() |
![]() |
Bill Scott, Gavin Doughtie, Doug Crockford. |
There’s still another day of TAE remaining, but I have to get this post out today to leave room for a new post tomorrow. I’m looking forward to Ask The Experts with Brendan Eich, Douglas Crockford, John Resig, Joe Walker and Dylan Schiemann. On Wednesday I deliver a talk that includes a release of a new tool. Stay tuned!
First Week of Classes
I finished my first week teaching CS193H High Performance Web Sites at Stanford. It went well. The class is settling in at 35 students plus another 10 watching remotely through the Stanford Center for Professional Development. It’s 2/3 undergrad, 1/3 grad. The students are smart and ask great questions.
I’ve laid out the class schedule for the quarter. I post the slides before each lecture. Here are links to the slides from this week’s classes:
- Introduction (ppt, GDoc)
- The Importance of Frontend Performance (ppt, GDoc)
- HTTP and Web 100 Performance Profile (ppt, GDoc)
Today we kicked off the first class project. Each student picks five top web sites and records different performance stats in our Web 100 Performance Profile spreadsheet. We’ll update this again at the end of the quarter and analyze any trends.
I’m traveling to Boston next week for The Ajax Experience and won’t be able to teach class. I’m excited to say I’ve lined up three great industry leaders, speakers, and friends to appear as guest lecturers. Joseph Smarr (Plaxo) is doing a talk on “Performance Challenges for the Open Web”. Lindsey Simon (Google) is going to provide an introduction to CSS and talk about the balance between design and performance. Bill Scott (Netflix) has prepped a talk entitled “High Performance Examples from the Real World”, sure to contain great anecdotes and lessons learned from his work at Yahoo! and Netflix. Luckily the class is videotaped, so I can watch their classes when I get back.
See you in Boston!
CS193H: High Performance Web Sites
This week I start teaching CS193H, “High Performance Web Sites”, at Stanford. I’ve evangelized fast web pages at conferences and tech companies, most recently Twitter and LinkedIn. Teaching at the university level is a natural progression in evangelizing what I’ve learned (and continue to learn) about web performance.
I’ve always thought the format of my book lent itself to a lecture series, where each lecture would cover one of the performance best practices. I’ve done 3 hour workshops before, but this class meets three days a week for 50 minutes from now through December 5. Do I really have enough material to fill that much time?
I laid out the class schedule and feel comfortable that there is enough content – I actually had to cut a few classes to allow room for the class projects. I’ll post my slides on the class schedule so others can follow along. I’m planning two projects. Web 100 Performance Profile will be a record of the performance traits of the top web sites in the world. Students will pick five web sites and fill in the performance traits by running YSlow. We’ll gather stats at the beginning and end of the quarter, and compare them to see how (if?) the industry is improving. Improving a Top 100 Web Site is the class project that runs through the entire quarter. Students adopt one web site and implement each performance improvement to a local copy of the static code. We’ll track how much impact each individual improvement has on load time as well as other performance characteristics (total size, number of requests, etc.). At the end of the quarter I’ll share the results from both projects.
So tomorrow’s my first day of school! I need to get to bed and get a good night’s sleep. I’ll post an update after I have a few classes under my belt and let you know how it’s going. Wish me luck!
“Delayed Script Execution” in Opera
I’ve recently been evangelizing techniques for loading external scripts without blocking the rest of the page. A co-worker at Google pointed out a little known option in Opera: Delayed Script Execution. Opera’s support web site provides this explanation:
This option is handy for developers today, and provides a guideline for how script deferral should be built in all browsers.
Loading Scripts Without Blocking and document.write
There are serveral advanced techniques for loading scripts without blocking other downloads and rendering. At recent conferences, I’ve talked about six alternatives for asynchronous script loading. (See slides 14-27 from my OSCON slide deck, for example.)
One limitation of these techniques is that you can’t use document.write, because when a script is loaded asynchronously the browser has already written the document. Hardcore JavaScript programmers avoid document.write, but it’s still used in the real world most notably, and infamously, by ads. A feature of Opera’s “Delayed Script Execution” option is that, even though scripts are deferred, document.write still works correctly. Opera remembers the script’s location in the page and inserts the document.write output appropriately.
An Example
I created a test page that demonstrates how this feature would benefit all browsers. The test page contains a script followed by eight images. The script is programmed to take 4 seconds to download and 4 seconds to execute. The images are each programmed to take 1 second to download. In most browsers, the default behavior is that the browser starts downloading and executing the script first, since it occurs first in the page. The images have to wait 8 seconds for the script to finish before they are downloaded and rendered. The text in the page is also blocked from rendering for 8 seconds, even though it’s already been downloaded before the script. This is because browsers block rendering of anything (text and resources) below a script until the script is done downloading. Let’s look at how this page behaves in popular browsers.
- IE 6,7,8
- The 8 second script blocks everything else in the page until it’s done downloading and executing. In IE 6,7 the overall page load time is ~12 seconds in IE 6,7 (8 seconds for the script plus 4 seconds for the images to load over two connections). In IE8 it’s ~10 seconds (8 seconds for the script plus 2 seconds for the images to load over six connections).IE has its own asynchronous script loading technique – the DEFER attribute. Clicking the “Load with Defer” button adds this attribute to the SCRIPT tag in the page. The results are disappointing – because the script contains document.write, the entire page is overwritten. The HTML 4.01 spec notes that DEFER should only be used when “the script is not going to generate any document content (e.g., no ‘document.write’ in javascript)”. And this example proves that IE’s built-in mechanism for script deferral should not be used for scripts that contain document.write.
- Firefox 2,3
- Loading the page in Firefox 2,3 produces results similar to IE — the script blocks the images. In Firefox 2 the overall page load time is ~12 seconds (8 seconds for the script plus 4 seconds for the images to load over two connections). In Firefox 3 it’s ~10 seconds (8 seconds for the script plus 2 seconds for the images to load over six connections).Since Firefox doesn’t support the DEFER attribute, you can click on the “Load Dynamically” button to load the script using one of the advanced non-blocking techniques (“script DOM element”). The page starts off loading great — the entire text is rendered immediately and the images start downloading and rendering quickly, all because the script is being loaded asynchronously. But when the script finally finishes and does its document.write, the entire page is overwritten, similar to what happens in IE. Scripts containing document.write can’t be loaded asynchronously in Firefox.
- Safari 526
- The script and eight images are downloaded in parallel. (Parallel script loading is a new feature in Safari 526.) But rendering is blocked for the entire 8 seconds while the script is downloaded and executed. Overall page load time is ~8 seconds, since the images and script are downloaded in parallel.To try loading the script asynchronously, click on the “Load Dynamically” button. The text is rendered immediately and the images start downloading and rendering quickly, all because the script is being loaded asynchronously. But when the script finishes and does its document.write, it’s not shown in the page. Scripts containing document.write won’t have their output shown in the page if loaded asynchronously in Safari.
- Opera 9.5
- Without enabling the “Delayed Script Execution” option, the page load experience is similar to the other browsers: The script is downloaded first, blocking the image downloads and text rendering for 8 seconds. The overall page load is a little faster, ~9 seconds, because Opera supports eight simultaneous connections, so the eight images download in 1 second.Enabling “Delayed Script Execution” provides an exciting alternate experience. The entire page’s text is rendered immediately. The eight images start downloading and rendering right away. In the background the 8-second script is downloading, so the overall page load time is ~8 seconds. The icing on the cake is that when the script is done executing its document.write appears where it should – a purple DIV in the box at the top. It doesn’t overwrite the page and it doesn’t disappear. It goes right where it was supposed to go.
What’s all the excitement?!
One immediate benefit of this Opera preference is that web developers can see the impact of delay-loading their JavaScript. A practice I’m advocating a lot lately is splitting a large JavaScript payload into two pieces, one of which can be loaded using an asynchronous script loading technique. This is often a complex task as the JavaScript payload grows in size and complexity. With this “Delayed Script Execution” feature in Opera, developers can get an idea of how their page would feel before undertaking the heavy lifting.
I’m even more excited about how this shows us what is possible for the future. To be able to have asynchronous script loading and preserve document.write output is like having your cake and eating it too. It’s difficult for users to find this feature in Opera. And it’s beyond the reach of web developers. But if Opera’s “Delayed Script Execution” behavior was the basis for implementing SCRIPT DEFER in all browsers, it would open the door for significant performance improvements by simply adding six characters (“DEFERÂ “).
This is most significant for the serving of ads. Often ads are served by including a script that contains document.write to load other resources: images, flash, or even another script. Ads are typically placed high in the page, which means today’s pages suffer from slow loading ads because all their content gets blocked. And really, it’s not the pages that suffer, it’s the users. Our experience suffers. Everyone’s experience suffers. If browsers supported an implementation of SCRIPT DEFER that behaved similar to Opera’s “Delayed Script Execution” feature, we’d all be better off.
Food for thought for Safari, Firefox, and IE.
UA Profiler and Google Chrome
The night before it launched, I sat down to analyze Google Chrome from a performance perspective – what good performance traits did it have and which ones was it missing? I couldn’t get to the internal download links and had to resign myself to waiting for the official launch the next day.
Or did I?
Instead of watching Season 2 of Dexter or finally finishing Thunderstruck, I stayed up late writing some tests that could automatically poke at Chrome to determine its performance personality. It worked great, and made me want to run the same analysis for other browsers, and open it up to the web community to measure any browser of interest. A week later, I’m happy to announce the release of UA Profiler.
UA Profiler looks at browser characteristics that make pages load faster, such as downloading scripts without blocking, max number of open connections, and support for “data:” urls. The tests run automatically – all you have to do is navigate to the test page from any browser with JavaScript support. The results are available to everyone, regardless of whether you’ve run the tests. See the FAQ for a description of each test and my list of caveats (additional tests that are needed, limited browser detection, etc.). Here’s the overall scores for some popular browsers.
Browser | UA Profiler Score |
---|---|
Chrome | 8/11 |
Firefox 2 | 7/11 |
Firefox 3 | 8/11 |
IE 6,7 | 4/11 |
IE 8 | 7/11 |
Opera 9.5 | 5/11 |
Safari 526 | 8/11 |
Getting back to where I started, let’s look at Chrome’s performance profile. It’s nice to see that Chrome is at the top right out of the shoot, having 8 of the 11 performance attributes tested, tied with Firefox 3 and Safari 526. The three performance features Chrome is missing are loading scripts and other resources in parallel, not blocking downloads when a stylesheet is followed by an inline script, and support for prefetch links. Looking at the Chrome user agent string we can see it’s based on Safari 525. If Chrome incorporates Safari 526 improvements, it’ll fix the most important performance trait – loading scripts in parallel. The detailed results show that Chrome has one improvement over Safari: caching redirects that have an expiration date in the future.
UA Profiler will continue to evolve. I have a handful of tests to add and there are several GUI improvements to be made. The main way it’ll grow is by people navigating to the tests with their own browsers, especially browsers that aren’t currently in the list. On each page there are links for contacting me about bugs, mistakes, and suggestions. I look forward to your feedback.