SpriteMe (part 3)
This is the third of three blog posts about SpriteMe. The first post, SpriteMe makes spriting easy, reviews SpriteMe’s features with a summary of the performance savings across the Alexa U.S. top 100 web sites. The second post, SpriteMe (part 2), talks about my experience at WordCamp that motivated SpriteMe and the logic behind how SpriteMe makes its recommendations for creating sprites. This post discusses issues around sprite file size (and memory size) and wraps up with thoughts on major next steps.
File (and Memory) Size
In SpriteMe makes spriting easy, I showed how running SpriteMe on the Alexa U.S. Top 100 eliminated eight HTTP requests on average, but resulted in a 6K increase in total size of background images. Although this is a beneficial tradeoff (8 fewer requests in exchange for 6K more data), I’d prefer to see a reduction in overall size. After all, combining images into a single sprite reduces the amount of image formatting information, and frequently produces a savings.
The primary reason that SpriteMe produces an overall size increase is because of colors. Images that use 255 colors or less use a smaller color palette and have a smaller file size. Currently, SpriteMe has no knowledge of the number of colors, and which colors, an image uses. It’s possible for SpriteMe to combine two images, each of which is 255 colors or less, into a sprite that exceeds 255 colors, resulting in an overall increase in size. In the future, I plan on having SpriteMe use a web service to provide color palette information (issue #17). Until then, if SpriteMe produces a significant size increase, you can manually rearrange the images to stay within the 255 color threshold. Try grouping similarly colored images together. Fortunately, SpriteMe makes it easy to try different combinations and immediately see the size of the resulting sprite.
Another size concern has to do with memory size. As pointed out by Vladimir Vukićević in his post To Sprite Or Not To Sprite, although an image file can have a high degree of compression, most browsers render an image based on dimensions, allocating the same amount of memory for each pixel. The image he cites as an example is a 1299×15000 png. The download size is only 26K, but because of the huge dimensions the image consumes 75M when loaded in Internet Explorer, Firefox, Safari, and Chrome. (Not Opera, however – hmmmm.) Luckily, I’ve never seen SpriteMe recommend a sprite even close to this size. Even so, SpriteMe currently doesn’t “pack” images when it lays them out in the sprite, but that is on SpriteMe’s future feature list (issue #53). If SpriteMe generates a sprite with extremely large dimensions, try grouping narrow images in one sprite, and wide images in a different sprite.
Big Ticket Items
SpriteMe has about 50 open issues currently, but these are the major features I look forward to seeing added to SpriteMe:
- export modified CSS
- The #1 requested feature is an easier way to export the CSS changes (background-image and background-position). Ideally, SpriteMe would display the exact rule with the appropriate modifications. But, as with many things, the devil is in the details. The CSS changes might occur across multiple rules. Tracking down the exact rules is challenging, given how styles cascade. Finally, the rule’s cssText available through the DOM is unlikely to match the original CSS, and yet developers want something as close to their original code as possible, to make it easier to diff. And, of course, the way to do this differs across browsers. This isn’t impossible, but it’s going to be very hard.
- create foreground IMG sprites
- SpriteMe eliminates eight HTTP requests on average across the Alexa U.S. top 100 web sites. For some sites, over thirty requests are eliminated. But an even greater potential for savings can be had by spriting normal “content images”. These are the images embedded in the page using the IMG tag. This is possible, and I’ve seen web sites that do it, but it adds a fair amount of extra complexity. The IMG tag must be wrapped by a DIV with style “position: relative; display: inline-block; overflow: hidden; width: <img-width>; height: <img-height>;”. Then the IMG’s style is set to “position: absolute; left: <sprite-offset-x>; top: <sprite-offset-y>;”. I need to test this more, gauge the potential benefits are, and see if it can be done in an intuitive way.
- sprite image editor
- Creating and testing sprites is a challenge, but maintaining sprites is also difficult. One idea is to retain a memory of the images that were combined into the sprite for more intelligent editing later on. For example, for each background image added to a sprite, it’s original URL and coordinates (upper-left and lower-right) can be recorded. This information would be saved in a separate manifest file or added to the sprite itself as comments. Later, a special sprite image editor could make it easy to refresh or replace one of the background images, without affecting any of the other images in the sprite.
SpriteMe’s ability to modify the live page is a powerful feature. This speeds up the development process by promoting faster iterations. I’d like to find more opportunities for this style of development. One example, somewhat related, could be with image optimization. Page Speed compresses the page’s images on-the-fly using lossless techniques. A button that swapped out the bloated images for the optimized images would allow developers to verify the quality of the images before updating their files.
Spriting is a difficult task. SpriteMe makes the job easier. We’re seeing more and more tools that help developers build high performance web sites. Add SpriteMe to your developer toolkit, and stay tuned for a rash of posts about other performance tools that have recently entered the scene.
SpriteMe (part 2)
This is the second of three blog posts about SpriteMe. The first post, SpriteMe makes spriting easy, reviews SpriteMe’s features with a summary of the performance savings across the Alexa U.S. top 100 web sites. This post talks about my experience at WordCamp that motivated SpriteMe and the logic behind how SpriteMe makes its recommendations for creating sprites. The third post, SpriteMe (part 3), discusses issues around sprite file size (and memory size) and wraps up with thoughts on major next steps.
The Motivation
I had a cool idea for my talk at WordCamp this past May: I would optimize a WordPress theme live onstage. The night before (says the procrastinator), I sat down to do a dry run. It only took me about 30 minutes to do most of the performance optimizations (Apache cache and compression settings, move scripts to the bottom, commandline YUI Compressor, etc.).
The final optimization was creating CSS sprites from the various background images in the theme. Here’s the process I went through:
- Run YSlow to find the CSS background images.
- Download the images and zip them.
- Create the sprited image using CSS Sprite Generator.
- Download the sprited image to my server.
- Edit the CSS that used one of the old background images to use the new sprite image and add a background-position using the offsets provided by Sprite Generator.
- Reload the page.
- Visually determine which backgrounds were broken.
- Repeat parts of steps 2-7 multiple times trying to find the right combination of images and background-position values.
Hours later, I had figured out a fairly good number of images to sprite together and the necessary CSS changes to retain the page’s original rendering. But it had taken waaaaay too long. That’s when I decided to create SpriteMe.
Spriting Logic
The hardest part of building SpriteMe was figuring out which background images could be sprited, and how they could be combined. Most of what I now know about sprites I learned in the last few months, so it’s important for me to document this logic and encourage CSS gurus to give feedback. Here’s the high-level logic that spriteme.js currently uses behind its sprite suggestions:
- skip images that repeat-x AND repeat-y
- Images that repeat both vertically and horizontally can’t be sprited because neighboring images would be visible.
- skip images that are very “short” and “narrow”
- A “short” background image has a height that is smaller than the height of its containing block. Similarly, the width of a “narrow” image is smaller than the width of its containing block. An image that is short and narrow can be sprited as long as there’s enough padding (whitespace) around it to make up the difference between it and its containing block. If this padding is too large, adding a short and narrow image to a sprite might result in an increase in the size of the sprite image (both file size and memory footprint) that negates the benefits of spriting. Background images that are only short or only narrow can be sprited and are addressed below.
- sprite repeat-x images of similar width together vertically
- The key with horizontally repeating images is that there can’t be any padding to the left and right of the image – anything on the sides will bleed through with each repeat. Thus, repeat-x images can be sprited in a vertically-stacked sprite if the sprite is the same width as the image. SpriteMe started by only spriting repeat-x images with the same width. I expanded that to enable spriting images of different widths, as long as a least-common-multiple of their widths could be found that was 20px or less. For example, repeat-x images with widths of 2, 3, and 9 could be sprited in an image 18px wide. This is done by replicating the 2px image nine times, the 3px image six times, and the 9px image twice. (Repeat-x images greater than 20px can still be sprited together, but only with other repeat-x images of the exact same width.)
- sprite repeat-y images of similar height together horizontally
- Repeat-y images of similar height are sprited together just as repeat-x images of similar width are sprited together. (See the previous bullet.)
- sprite “short” images together horizontally
- Imagine a background image (such as a watermark) that is dramatically shorter than the containing block – maybe the background image is 10px high and the containing block is 500px high. This image can’t be sprited vertically – it would require 490px of padding above or below it. But it can be sprited vertically. (These images are not narrow and do not repeat – those cases fit into one of the prior bullets.)
- sprite everything else vertically
- Background images that don’t fall into any of the above categories can be sprited vertically. These are images that don’t repeat AND are not short. In summary, they are about the same size as (or smaller than) their containing block, or they’re narrow
The logic above catches most of the opportunities for spriting, but I’ve thought of a few more situations where it might be possible to eliminate one or more additional image requests:
- Currently, SpriteMe puts repeat-x images in one sprite, and non-repeating images in a different (vertically-stacked) sprite, and never the twain shall meet. But repeat-x images could be added to the non-repeating sprite so long as they were the same width. (The repeat-x images could be replicated to achieve an equal width as described previously.) This would result in one fewer sprite image. The same could be done with repeat-y images and a non-repeating horizontally-stacked sprite.
- Images that are exceedingly “short” aren’t sprited. But if one such image is positioned at the bottom of its containing block, it could be placed at the top of a vertically-stacked sprite. Similarly, if it’s positioned at the top of its containing block, it could be placed at the bottom of a vertically-stacked sprite. This logic could be applied to “narrow” images, too, placing them to the left or right of a horizontally-stacked sprite. In the best case, this could eliminate an extra four HTTP requests.
- I’ve seen some images styled to repeat, but their containing block is the same size or smaller than the background image. These could potentially be sprited as if they weren’t repeating.
Finally, SpriteMe doesn’t sprite these images:
- jpegs – It’s totally feasible to sprite jpeg images, but while testing SpriteMe on popular web sites I found that the resulting sprite was much larger than the total size of the original background images (50K or more). A common cause was combining jpegs with images having 255-or-fewer colors. Until SpriteMe gets knowledge about color palette (issue #17), it skips over jpeg images.
- firewalled images – SpriteMe uses the coolRunnings service to create the sprite. This is done by sending the background image URLs to the coolRunnings service. If the URL is behind a firewall or otherwise restricted, it can’t be sprited.
- 0x0 elements – When generating sprite suggestions, SpriteMe examines the height and width of DOM elements with a background image. In some cases (drop down menus, popup DIVs), these DOM elements haven’t been rendered and have a zero height and width. If all of its DOM elements have a zero height and width, that background image is not sprited.
This is a thorough (i.e., lengthy) deep dive into the rules of spriting. I plan on expanding this write-up with figures as part of the SpriteMe documentation. My hope is that the logic behind SpriteMe will eventually be used in tools that make web sites fast by default, like Aptimize and Strangeloop Networks. But using this logic in production environments is still a ways off. That’s why it’s important to capture this logic in this post and in SpriteMe, and to solicit your feedback.
In my next blog post about SpriteMe, I’ll talk about ideas for reducing the sprite file size, and big items on the todo list.
SpriteMe makes spriting easy
I’m excited to announce SpriteMe – a tool that makes it easy to create CSS sprites. It’s a bookmarklet, so it runs in Internet Explorer, Firefox, Safari, and Chrome. Just drag&drop this link to your bookmarks toolbar or right-click to add it to your favorites: SpriteMe.
A CSS sprite combines multiple background images into a single image. Spriting makes web pages faster because it reduces the number of HTTP requests required to load a web page. The problem is, creating sprites is hard, involving arcane knowledge and lots of trial and error. SpriteMe removes the hassles with the click of a button. Try the tutorial to see SpriteMe in action.
Features
Other sprite tools (like CSS Sprite Generator and CSS Sprites generator) are useful for combining multiple background images into a single sprite image. This is a big help, and I’ve frequently used these tools built by Stuart and Ed, and Stoyan. However, the bigger challenges when creating sprites are finding the background images and knowing how to group them into sprites. Once the sprite image is created, testing the sprite, including changing all the CSS, is time consuming. SpriteMe addresses all of these challenges. Here’s a rundown of SpriteMe’s features.
Finds Background Images: SpriteMe makes it easy to examine the CSS background images used in the current page by listing all the background images and the elements that use them. Hovering over its URL displays the background image. Clicking on one of the image’s DOM elements highlights that element in the page.

Groups Images into Sprites: The logic behind spriting is complex: Background images that repeat horizontally can’t be sprited with images that repeat vertically. Repeating images that are sprited together must be the same width or height, depending on whether they repeat-x or repeat-y, respectively. Background images that are shorter than their containing block should go at the top of a vertically-stacked sprite if their position is “bottom”, but they should go at the bottom if their position is “top”. It’s complex – and this is just the beginning of the logic that’s required to avoid mistakes. SpriteMe contains this spriting logic and uses it to suggest which images to combine into a sprite. Figure 1 shows the varied width background images that SpriteMe recommends combining into a vertically-stacked sprite for eBay.
Generates the Sprite: Clicking “make sprite” combines the images into a sprite using the coolRunnings service from Jared Hirsch. No zipping, no uploading form – it just happens. Figure 2 shows the newly created sprite: spriteme1.png
.

Recomputes CSS background-position: Now comes the math. In Figure 1, the background-position for the #Buy element was “0% 100%”. Most sprite generation tools provide background-positions assuming every element starts off being “0% 0%”. Looking at the Alexa U.S. top 10, only 12% (56/459) of the elements that use a background image have a “0% 0%” background-position. More automated help is needed.
SpriteMe, because it runs in the web page, is able to convert the old background-position (“0% 100%”) to the exact offsets needed to use the sprite: “-10px -27px”. Converting “0%” is easy – the sprite has 10px of padding on the left, so must be offset by -10px. The “100%” conversion is harder. The original background image was 45px high. “100%” means the bottom of the image was aligned with the bottom of the DOM element. The DOM element is 28px high (as indicated by “40×28” shown in Figure 2), so we want the 17th pixel of the image to align with the top of the DOM element. This requires an offset of -17px, plus the 10px of padding results in an offset of -27px. Easy, right? Thank goodness for SpriteMe.
Injects the Sprite into the Current Page: As developers, we know it’s critical to optimize the edit-save-test cycle in whatever we’re doing. In the past, the edit-save-test cycle for spriting was painfully long: regenerating the sprite image, saving it back to the server’s “/images/” directory, editing and saving the CSS, clearing the cache, and reloading the page. SpriteMe bypasses all of that by modifying the live page as you watch. In Figure 2, the #Buy element’s CSS has been modified by SpriteMe to use the new sprite, spriteme1.png
, along with the new background-position, “-10px -27px”. It might be hard to get excited – success means the page’s appearance doesn’t change at all! But injecting the sprite into the live page means developers can create sprites and verify they work in minutes instead of hours.
The Savings
In all the excitement about features, I almost forgot about the most important part: how much does SpriteMe help reduce HTTP requests? The savings for the Alexa U.S. Top 100 shows SpriteMe replaces 9 background images with 1 sprite on average, for an overall reduction of 8 HTTP requests per page. These stats are tracked using SpriteMe’s “share your results” feature – after running SpriteMe, you can choose to send the savings to SpriteMe’s Savings page. This savings of 8 HTTP requests results in an average increase of 6K in image size. This is a worthwhile trade-off, but ideas for decreasing the sprite file size are in the works.
SpriteMe is an open source project, with all the usual links: Code, Group, Bugs, and Contact. In the second blog post, SpriteMe (part 2), I talk about my experience at WordCamp that motivated SpriteMe and the logic behind how SpriteMe makes its recommendations for creating sprites. The third post, SpriteMe (part 3), discusses issues around sprite file size (and memory size) and wraps up with thoughts on major next steps.
Doloto: JavaScript download optimizer
One of the speakers at Velocity 2008 was Ben Livshits from Microsoft Research. He spoke about Doloto, a system for splitting up huge JavaScript payloads for better performance. I talk about Doloto in Even Faster Web Sites. When I wrote the book, Doloto was an internal project, but that all changed last week when Microsoft Research released Doloto to the public.
The project web site describes Doloto as:
…an AJAX application optimization tool, especially useful for large and complex Web 2.0 applications that contain a lot of code, such as Bing Maps, Hotmail, etc. Doloto analyzes AJAX application workloads and automatically performs code splitting of existing large Web 2.0 applications. After being processed by Doloto, an application will initially transfer only the portion of code necessary for application initialization.
Anyone who has tried to do code analysis on JavaScript (a la Caja) knows this is a complex problem. But it’s worth the effort:
In our experiments across a number of AJAX applications and network conditions, Doloto reduced the amount of initial downloaded JavaScript code by over 40%, or hundreds of kilobytes resulting in startup often faster by 30-40%, depending on network conditions.
Hats off to Ben and the rest of the Doloto team at Microsoft Research on the release of Doloto. This and other tools are sorely needed to help with some of the heavy lifting that now sits solely on the shoulders of web developers.
back in the saddle: EFWS! Velocity!
The last few months are a blur for me. I get through stages of life like this and look back and wonder how I ever made it through alive (and why I ever set myself up for such stress). The big activities that dominated my time were Even Faster Web Sites and Velocity.
Even Faster Web Sites
Even Faster Web Sites is my second book of web performance best practices. This is a follow-on to my first book, High Performance Web Sites. EFWS isn’t a second edition, it’s more like “Volume 2”. Both books contain 14 chapters, each chapter devoted to a separate performance topic. The best practices described in EFWS are completely new:
|
|
An exciting addition to EFWS is that six of the chapters were contributed by guest authors: Doug Crockford (Chap 1), Ben Galbraith and Dion Almaer (Chap 2), Nicholas Zakas (Chap 7), Dylan Schiemann (Chap 8), Tony Gentilcore (Chap 9), and Stoyan Stefanov and Nicole Sullivan (Chap 10). Web developers working on today’s content rich, dynamic web sites will benefit from the advice contained in Even Faster Web Sites.
Velocity
Velocity is the web performance and operations conference that I co-chair with Jesse Robbins. Jesse, former “Master of Disaster” at Amazon and current CEO of Opscode, runs the operations track. I ride herd on the performance side of the conference. This was the second year for Velocity. The first year was a home run, drawing 600 attendees (far more than expected – we only made 400 swag bags) and containing a ton of great talks. Velocity 2009 (held in San Jose June 22-24) was an even bigger success: more attendees (700), more sponsors, more talks, and an additional day for workshops.
The bright spot for me at Velocity was the fact that so many speakers offered up stats on how performance is critical to a company’s business. I wrote a blog post on O’Reilly Radar about this: Velocity and the Bottom Line. Here are some of the excerpted stats:
- Bing found that a 2 second slowdown caused a 4.3% reduction in revenue/user
- Google Search found that a 400 millisecond delay resulted in 0.59% fewer searches/user
- AOL revealed that users that experience the fastest page load times view 50% more pages/visit than users experiencing the slowest page load times
- Shopzilla undertook a massive performance redesign reducing page load times from ~7 seconds to ~2 seconds, with a corresponding 7-12% increase in revenue and 50% reduction in hardware costs
I love optimizing web performance because it raises the quality of engineering, reduces inefficiencies, and is better for the planet. But to get widespread adoption we need to motivate the non-engineering parts of the organization. That’s why these case studies on web performance improving the user experience as well as the company’s bottom line are important. I applaud these companies for not only tracking these results, but being willing to share them publicly. You can get more details from the Velocity videos and slides.
Back in the Saddle
Over the next six months, I’ll be focusing on open sourcing many of the tools I’ve soft launched, including UA Profiler, Cuzillion, Hammerhead, and Episodes. These are already “open source” per se, but they’re not active projects, with a code repository, bug database, roadmap, and active contributors. I plan on fixing that and will discuss this more during my presentation at OSCON this week. If you’re going to OSCON, I hope you’ll attend my session. If not, I’ll also be signing books at 1pm and providing performance consulting (for free!) at the Google booth at 3:30pm, both on Wednesday, July 22.
As you can see, even though Velocity and EFWS are behind me, there’s still a ton of work left to do. We’ll never be “done” fixing web performance. It’s like cleaning out your closets – they always fill up again. As we make our pages faster, some new challenge arises (mobile, rich media ads, emerging markets with poor connectivity) that requires more investigation and new solutions. Some people might find this depressing or daunting. Me? I’m psyched! ‘Scuse me while I roll up my sleeves.
State of Performance 2008
My Stanford class, CS193H High Performance Web Sites, ended last week. The last lecture was called “State of Performance”. This was my review of what happened in 2008 with regard to web performance, and my predictions and hopes for what we’ll see in 2009. You can see the slides (ppt, GDoc), but I wanted to capture the content here with more text. Let’s start with a look back at 2008.
2008
- Year of the Browser
- This was the year of the browser. Browser vendors have put users first, competing to make the web experience faster with each release. JavaScript got the most attention. WebKit released a new interpreter called Squirrelfish. Mozilla came out with Tracemonkey – Spidermonkey with Trace Trees added for Just-In-Time (JIT) compilation. Google released Chrome with the new V8 JavaScript engine. And new benchmarks have emerged to put these new JavaScript engines through the paces: Sunspider, V8 Benchmark, and Dromaeo.
In addition to JavaScript improvements, browser performance was boosted in terms of how web pages are loaded. IE8 Beta came out with parallel script loading, where the browser continues parsing the HTML document while downloading external scripts, rather than blocking all progress like most other browsers. WebKit, starting with version 526, has a similar feature, as does Chrome 0.5 and the most recent Firefox nightlies (Minefield 3.1). On a similar note, IE8 and Firefox 3 both increased the number of connections opened per server from 2 to 6. (Safari and Opera were already ahead with 4 connections per server.) (See my original blog posts for more information: IE8 speeds things up, Roundup on Parallel Connections, UA Profiler and Google Chrome, and Firefox 3.1: raising the bar.)
- Velocity
- Velocity 2008, the first conference focused on web performance and operations, launched June 23-24. Jesse Robbins and I served as co-chairs. This conference, organized by O’Reilly, was densely packed – both in terms of content and people (it sold out!). Speakers included John Allspaw (Flickr), Luiz Barroso (Google), Artur Bergman (Wikia), Paul Colton (Aptana), Stoyan Stefanov (Yahoo!), Mandi Walls (AOL), and representatives from the IE and Firefox teams. Velocity 2009 is slated for June 22-24 in San Jose and we’re currently accepting speaker proposals.
- Jiffy
- Improving web performance starts with measuring performance. Measurements can come from a test lab using tools like Selenium and Watir. To get measurements from geographically dispersed locations, scripted tests are possible through services like Keynote, Gomez, Webmetrics, and Pingdom. But the best data comes from measuring real user traffic. The basic idea is to measure page load times using JavaScript in the page and beacon back the results. Many web companies have rolled their own versions of this instrumentation. It isn’t that complex to build from scratch, but there are a few gotchas and it’s inefficient for everyone to reinvent the wheel. That’s where Jiffy comes in. Scott Ruthfield and folks from Whitepages.com released Jiffy at Velocity 2008. It’s Open Source and easy to use. If you don’t currently have real user load time instrumentation, take a look at Jiffy.
- JavaScript: The Good Parts
- Moving forward, the key to fast web pages is going to be the quality and performance of JavaScript. JavaScript luminary Doug Crockford helps lights the way with his book JavaScript: The Good Parts, published by O’Reilly. More is needed! We need books and guidelines for performance best practices and design patterns focused on JavaScript. But Doug’s book is a foundation on which to build.
- smush.it
- My former colleagues from the Yahoo! Exceptional Performance team, Stoyan Stefanov and Nicole Sullivan, launched smush.it. In addition to a great name and beautiful web site, smush.it packs some powerful functionality. It analyzes the images on a web page and calculates potential savings from various optimizations. Not only that, it creates the optimized versions for download. Try it now!
- Google Ajax Libraries API
- JavaScript frameworks are powerful and widely used. Dion Almaer and the folks at Google saw an opportunity to help the development community by launching the Ajax Libraries API. This service hosts popular frameworks including jQuery, Prototype, script.aculo.us, MooTools, Dojo, and YUI. Web sites using any of these frameworks can reference the copy hosted on Google’s worldwide server network and gain the benefit of faster downloads and cross-site caching. (Original blog post: Google AJAX Libraries API.)
- UA Profiler
- Okay, I’ll get a plug for my own work in here. UA Profiler looks at browser characteristics that make pages load faster, such as downloading scripts without blocking, max number of open connections, and support for “data:†URLs. The tests run automatically – all you have to do is navigate to the test page from any browser with JavaScript support. The results are available to everyone, regardless of whether you’ve run the tests. I’ve been pleased with the interest in UA Profiler. In some situations it has identified browser regressions that developers have caught and fixed.
2009
Here’s what I think and hope we’ll see in 2009 for web performance.
- Visibility into the Browser
- Packet sniffers (like HTTPWatch, Fiddler, and WireShark) and tools like YSlow allow developers to investigate many of the “old school” performance issues: compression, Expires headers, redirects, etc. In order to optimize Web 2.0 apps, developers need to see the impact of JavaScript and CSS as the page loads, and gather stats on CPU load and memory consumption. Eric Lawrence and Christian Stockwell’s slides from Velocity 2008 give a hint of what’s possible. Now we need developer tools that show this information.
- Think “Web 2.0”
- Web 2.0 pages are often developed with a Web 1.0 mentality. In Web 1.0, the amount of CSS, JavaScript, and DOM elements on your page was more tolerable because it would be cleared away with the user’s next action. That’s not the case in Web 2.0. Web 2.0 apps can persist for minutes or even hours. If there are a lot of CSS selectors that have to be parsed with each repaint -Â that pain is felt again and again. If we include the JavaScript for all possible user actions, the size of JavaScript bloats and increases memory consumption and garbage collection. Dynamically adding elements to the DOM slows down our CSS (more selector matching) and JavaScript (think getElementsByTagName). As developers, we need to develop a new way of thinking about the shape and behavior of our web apps in a way that addresses the long page persistence that comes with Web 2.0.
- Speed as a Feature
- In my second month at Google I was ecstatic to see the announcement that landing page load time was being incorporated into the quality score used by Adwords. I think we’ll see more and more that the speed of web pages will become more important to users, more important to aggregators and vendors, and subsequently more important to web publishers.
- Performance Standards
- As the industry becomes more focused on web performance, a need for industry standards is going to arise. Many companies, tools, and services measure “response time”, but it’s unclear that they’re all measuring the same thing. Benchmarks exist for the browser JavaScript engines, but benchmarks are needed for other aspects of browser performance, like CSS and DOM. And current benchmarks are fairly theoretical and extreme. In addition, test suites are needed that gather measurements under more real world conditions. Standard libraries for measuring performance are needed, a la Jiffy, as well as standard formats for saving and exchanging performance data.
- JavaScript Help
- With the emergence of Web 2.0, JavaScript is the future. The Catch-22 is that JavaScript is one of the biggest performance problems in a web page. Help is needed so JavaScript-heavy web pages can still be fast. One specific tool that’s needed is something that takes a monolithic JavaScript payload and splits into smaller modules, with the necessary logic to know what is needed when. Doloto is a project from Microsoft Research that tackles this problem, but it’s not available publicly. Razor Optimizer attacks this problem and is a good first step, but it needs to be less intrusive to incorporate this functionality.
Browsers also need to make it easier for developers to load JavaScript with less of a performance impact. I’d like to see two new attributes for the SCRIPT tag: DEFER and POSTONLOAD. DEFER isn’t really “new” – IE has had the DEFER attribute since IE 5.5. DEFER is part of the HTML 4.0 specification, and it has been added to Firefox 3.1. One problem is you can’t use DEFER with scripts that utilize document.write, and yet this is critical for mitigating the performance impact of ads. Opera has shown that it’s possible to have deferred scripts still support document.write. This is the model that all browsers should follow for implementing DEFER. The POSTONLOAD attribute would tell the browser to load this script after all other resources have finished downloading, allowing the user to see other critical content more quickly. Developers can work around these issues with more code, but we’ll see wider adoption and more fast pages if browsers can help do the heavy lifting.
- Focus on Other Platforms
- Most of my work has focused on the desktop browser. Certainly, more best practices and tools are needed for the mobile space. But to stay ahead of where the web is going we need to analyze the user experience in other settings including automobile, airplane, mass transit, and 3rd world. Otherwise, we’ll be playing catchup after those environments take off.
- Fast by Default
- I enjoy Tom Hanks’ line in A League of Their Own when Geena Davis (“Dottie”) says playing ball got too hard: “It’s supposed to be hard. If it wasn’t hard, everyone would do it. The hard… is what makes it great.” I enjoy a challenge and tackling a hard problem. But doggone it, it’s just too hard to build a high performance web site. The bar shouldn’t be this high. Apache needs to turn off ETags by default. WordPress needs to cache comments better. Browsers need to cache SSL responses when the HTTP headers say to. Squid needs to support HTTP/1.1. The world’s web developers shouldn’t have to write code that anticipates and fixes all of these issues.
Good examples of where we can go are Runtime Page Optimizer and Strangeloop appliances. RPO is an IIS module (and soon Apache) that automatically fixes web pages as they leave the server to minify and combine JavaScript and stylesheets, enable CSS spriting, inline CSS images, and load scripts asynchronously. (Original blog post: Runtime Page Optimizer.) The web appliance from Strangeloop does similar realtime fixes to improve caching and reduce payload size. Imagine combining these tools with smush.it and Doloto to automatically improve the performance of web pages. Companies like Yahoo! and Google might need more customized solutions, but for the other 99% of developers out there, it needs to be easier to make pages fast by default.
This is a long post, but I still had to leave out a lot of performance highlights from 2008 and predictions for what lies ahead. I look forward to hearing your comments.
Runtime Page Optimizer
The Runtime Page Optimizer (RPO) is an exciting product from Aptimize. RPO runs on a web server applying performance optimizations to pages at runtime, just before the page is sent to the browser. RPO automatically implements many of the best practices from my book and YSlow, so the guys from Aptimize contacted me and showed me an early version. Here are the performance improvements RPO delivers:
- minifies, combines and compresses JavaScript files
- minifies, combines and compresses stylesheets
- combines images into CSS sprites
- inlines images inside the stylesheet
- turns on gzip compression
- sets far future Expires headers
- loads scripts asynchronously
RPO reduces the number of HTTP requests as well as reducing the amount of data that is transmitted, resulting in a page that loads faster. In doing this the big question is, how much overhead does this add at runtime? RPO caches the resources it generates (combined scripts, combined stylesheets, sprites). The primary realtime cost is changing the HTML markup. Static pages, after they are massaged, are also cached. Dynamic HTML can be optimized without a significant slowdown, much less than what’s gained by adding these performance benefits.
RPO is available for SharePoint, ASP.NET and DotNetNuke sites, and they’re working on a version for Apache. This is an exciting step in the world of web performance, shifting the burden of optimization from the web developer to making pages fast by default. They’ve just released RPO, but they’re deploying to at least one global top 1000 web site, so stats should be available soon. I can’t wait to see!
Hammerhead: moving performance testing upstream
Today at The Ajax Experience, I released Hammerhead, a Firebug extension for measuring page load times.
Improving performance starts with metrics. How long does it take for the page to load? Seems like a simple question to answer, but gathering accurate measurements can be a challenge. In my experience, performance metrics exist at four stages along the development process.
- real user data – I love real user metrics. JavaScript frameworks like Jiffy measure page load times from real traffic. When your site is used by a large, diverse audience, data from real page views is ground-truth.
- bucket testing – When you’re getting ready to push a new release, if you’re lucky you can do bucket testing to gather performance metrics. You release the new code to a subset of users while maintaining another set of users on the old code (the “control”). If you sample your user population correctly and gather enough data, comparing the before and after timing information gives you a preview of the latency impact of your next release.
- synthetic or simulated testing – In some situations, it’s not possible to gather real user data. You might not have the infrastructure to do bucket testing and real user instrumentation. Your new build isn’t ready for release, but you still want to gauge where you are with regard to performance. Or perhaps you’re measuring your competitors’ performance. In these situations, the typical solution is to do scripted testing on some machine in your computer lab, or perhaps through a service like Keynote or Gomez.
- dev box – The first place performance testing happens (or should happen) is on the developer’s box. As she finishes her code changes, the developer can see if she made things better or worse. What was the impact of that JavaScript rewrite? What happens if I add another stylesheet, or split my images across two domains?
Performance metrics get less precise as you move from real user data to dev box testing, as shown in Figure 1. That’s because, as you move away from real user data, biases are introduced. For bucket testing, the challenge is selecting users in an unbiased way. For synthetic testing, you need to choose scenarios and test accounts that are representative of real users. Other variables of your real user audience are difficult or impossible to simulate: bandwidth speed, CPU power, OS, browser, geographic location, etc. Attempting to simulate real users in your synthetic testing is a slippery, and costly, slope. Finally, testing on the dev box usually involves one scenario on a CPU that is more powerful than the typical user, and an Internet connection that is 10-100 times faster.

Figure 1 - Precision and ability to iterate along the development process
Given this loss of precision, why would we bother with anything other than real user data? The reason is speed of development. Dev box data can be gathered within hours of a code change, whereas it can take days to gather synthetic data, weeks to do bucket testing, and a month or more to release the code and have real user data. If you wait for real user data to see the impact of your changes, it can take a year to iterate on a short list of performance improvements. To quickly identify the most important performance improvements and their optimal implementation, it’s important to improve our ability to gather performance metrics earlier in the development process: on the dev box.
As a developer, it can be painful to measure the impact of a code change on your dev box. Getting an accurate time measurement is the easy part; you can use YSlow, Fasterfox, or an alert dialog. But then you have to load the page multiple times. The most painful part is transcribing the load times into Excel. Were all the measurements done with an empty cache or a primed cache, or was that even considered?
Hammerhead makes it easier to gather performance metrics on your dev box. Figure 2 shows the results of hammering a few news web sites with Hammerhead. By virtue of being a Firebug extension, Hammerhead is available in a platform that web developers are familiar with. To setup a Hammerhead test, one or more URLs are added to the list, and the “# of loads” is specified. Once started, Hammerhead loads each URL the specified number of times.

Figure 2 - Hammerhead results for a few news web sites
The next two things aren’t rocket science, but they make a big difference. First, there are two columns of results, one for empty cache and one for primed cache. Hammerhead automatically clears the disk and memory cache, or just the memory cache, in between each page load to gather metrics for both of these scenarios. Second, Hammerhead displays the median and average time measurement. Additionally, you can export the data in CSV format.
Even if you’re not hammering a site, other features make Hammerhead a useful add-on. The Cache & Time panel, shown in Figure 3, shows the current URL’s load time. It also contains buttons to clear the disk and memory cache, or just the memory cache. It has another feature that I haven’t seen anywhere else. You can choose to have Hammerhead clear these caches after every page view. This is a nice feature for me when I’m loading the same page again and again to see it’s performance in an empty or a primed cache state. If you forget to switch this back, it gets reset automatically next time you restart Firefox.

Figure 3 - Cache & Time panel in Hammerhead

If you don’t have Hammerhead open, you can still see the load time in the status bar. Right clicking the Hammerhead icon gives you access for clearing the cache. The ability to clear just the memory cache is another valuable feature I haven’t seen elsewhere. I feel this is the best way to simulate the primed cache scenario, where the user has been to your site recently, but not during the current browser session.
Hammerhead makes it easier to gather performance metrics early in the development process, resulting in a faster development cycle. The biggest bias is that most developers have a much faster bandwidth connection than the typical user. Easy-to-install bandwidth throttlers are a solution. Steve Lamm blogged on Google Code about how Hammerhead can be used with bandwidth throttlers on the dev box, bringing together both ease and greater precision of performance measurements. GIve it a spin and let me know what you think.
UA Profiler and Google Chrome
The night before it launched, I sat down to analyze Google Chrome from a performance perspective – what good performance traits did it have and which ones was it missing? I couldn’t get to the internal download links and had to resign myself to waiting for the official launch the next day.
Or did I?
Instead of watching Season 2 of Dexter or finally finishing Thunderstruck, I stayed up late writing some tests that could automatically poke at Chrome to determine its performance personality. It worked great, and made me want to run the same analysis for other browsers, and open it up to the web community to measure any browser of interest. A week later, I’m happy to announce the release of UA Profiler.
UA Profiler looks at browser characteristics that make pages load faster, such as downloading scripts without blocking, max number of open connections, and support for “data:” urls. The tests run automatically – all you have to do is navigate to the test page from any browser with JavaScript support. The results are available to everyone, regardless of whether you’ve run the tests. See the FAQ for a description of each test and my list of caveats (additional tests that are needed, limited browser detection, etc.). Here’s the overall scores for some popular browsers.
Browser | UA Profiler Score |
---|---|
Chrome | 8/11 |
Firefox 2 | 7/11 |
Firefox 3 | 8/11 |
IE 6,7 | 4/11 |
IE 8 | 7/11 |
Opera 9.5 | 5/11 |
Safari 526 | 8/11 |
Getting back to where I started, let’s look at Chrome’s performance profile. It’s nice to see that Chrome is at the top right out of the shoot, having 8 of the 11 performance attributes tested, tied with Firefox 3 and Safari 526. The three performance features Chrome is missing are loading scripts and other resources in parallel, not blocking downloads when a stylesheet is followed by an inline script, and support for prefetch links. Looking at the Chrome user agent string we can see it’s based on Safari 525. If Chrome incorporates Safari 526 improvements, it’ll fix the most important performance trait – loading scripts in parallel. The detailed results show that Chrome has one improvement over Safari: caching redirects that have an expiration date in the future.
UA Profiler will continue to evolve. I have a handful of tests to add and there are several GUI improvements to be made. The main way it’ll grow is by people navigating to the tests with their own browsers, especially browsers that aren’t currently in the list. On each page there are links for contacting me about bugs, mistakes, and suggestions. I look forward to your feedback.
Firebug Working Group meeting
This week I hosted the Firebug Working Group meeting at Google. The goal of the Firebug Working Group is to provide an organization for the many volunteers contributing to the Firebug project. Now that Mozilla is dedicating resources to Firebug, I have great hopes that Firebug development will continue at a quicker pace. All three of the Mozilla members were present: John Resig, Rob Campbell, and Jan Odvarko (by phone). Also present or dialed in were John J. Barton, Doug Crockford, Tony Gentilcore, Kevin Hakman, Kyle Scholz, Dion Almaer, Mike Wilcox, Eric Promislow, Stoyan Stefanov, and myself.
Highlights: Firebug 1.2 is nearing final beta. After testing it’ll be promoted to stable. The main focus for the next release is going to be performance, stability, and testing. Some new features are on the todo list, such as adding new CSS rules, viewing bound DOM event handlers, and exporting CSS changes. More details are available in my notes from the meeting. It’s very exciting to have Mozilla more involved, and bodes well for the future of Firebug.