Books

Even Faster Web Sites in Skiathos

Ioannis Cherouvim, a software engineer from Greece, sent me this photo:

He took Even Faster Web Sites with him on his vacation to Skiathos, a Greek island in the Aegean Sea. In his email, Ioannis mentioned that applying the tips from my previous book to the newspaper web site Eleftherotypia (“free press” in Greek) saved “10GB traffic per day which really made a difference in our datacenter bill.” In addition, the users of the site got a better user experience.

Improved web performance and views of the Aegean Sea from the shores of a Greek island – now that’s heaven. Thanks, Ioannis – you made my day!

Books
EFWS
HPWS
Performance
Web Development

Comments (4)

Permalink

back in the saddle: EFWS! Velocity!

The last few months are a blur for me. I get through stages of life like this and look back and wonder how I ever made it through alive (and why I ever set myself up for such stress). The big activities that dominated my time were Even Faster Web Sites and Velocity.

Even Faster Web Sites

Even Faster Web Sites is my second book of web performance best practices. This is a follow-on to my first book, High Performance Web Sites. EFWS isn’t a second edition, it’s more like “Volume 2″. Both books contain 14 chapters, each chapter devoted to a separate performance topic. The best practices described in EFWS are completely new:

  1. Understanding Ajax Performance
  2. Creating Responsive Web Applications
  3. Splitting the Initial Payload
  4. Loading Scripts Without Blocking
  5. Coupling Asynchronous Scripts
  6. Positioning Inline Scripts
  7. Writing Efficient JavaScript
  1. Scaling with Comet
  2. Going Beyond Gzipping
  3. Optimizing Images
  4. Sharding Dominant Domains
  5. Flushing the Document Early
  6. Using Iframes Sparingly
  7. Simplifying CSS Selectors

An exciting addition to EFWS is that six of the chapters were contributed by guest authors: Doug Crockford (Chap 1), Ben Galbraith and Dion Almaer (Chap 2), Nicholas Zakas (Chap 7), Dylan Schiemann (Chap 8), Tony Gentilcore (Chap 9), and Stoyan Stefanov and Nicole Sullivan (Chap 10). Web developers working on today’s content rich, dynamic web sites will benefit from the advice contained in Even Faster Web Sites.

Velocity

Velocity is the web performance and operations conference that I co-chair with Jesse Robbins. Jesse, former “Master of Disaster” at Amazon and current CEO of Opscode, runs the operations track. I ride herd on the performance side of the conference. This was the second year for Velocity. The first year was a home run, drawing 600 attendees (far more than expected – we only made 400 swag bags) and containing a ton of great talks. Velocity 2009 (held in San Jose June 22-24) was an even bigger success: more attendees (700), more sponsors, more talks, and an additional day for workshops.

The bright spot for me at Velocity was the fact that so many speakers offered up stats on how performance is critical to a company’s business. I wrote a blog post on O’Reilly Radar about this: Velocity and the Bottom Line. Here are some of the excerpted stats:

  • Bing found that a 2 second slowdown caused a 4.3% reduction in revenue/user
  • Google Search found that a 400 millisecond delay resulted in 0.59% fewer searches/user
  • AOL revealed that users that experience the fastest page load times view 50% more pages/visit than users experiencing the slowest page load times
  • Shopzilla undertook a massive performance redesign reducing page load times from ~7 seconds to ~2 seconds, with a corresponding 7-12% increase in revenue and 50% reduction in hardware costs

I love optimizing web performance because it raises the quality of engineering, reduces inefficiencies, and is better for the planet. But to get widespread adoption we need to motivate the non-engineering parts of the organization. That’s why these case studies on web performance improving the user experience as well as the company’s bottom line are important. I applaud these companies for not only tracking these results, but being willing to share them publicly. You can get more details from the Velocity videos and slides.

Back in the Saddle

Over the next six months, I’ll be focusing on open sourcing many of the tools I’ve soft launched, including UA Profiler, Cuzillion, Hammerhead, and Episodes. These are already “open source” per se, but they’re not active projects, with a code repository, bug database, roadmap, and active contributors. I plan on fixing that and will discuss this more during my presentation at OSCON this week. If you’re going to OSCON, I hope you’ll attend my session. If not, I’ll also be signing books at 1pm and providing performance consulting (for free!) at the Google booth at 3:30pm, both on Wednesday, July 22.

As you can see, even though Velocity and EFWS are behind me, there’s still a ton of work left to do. We’ll never be “done” fixing web performance. It’s like cleaning out your closets – they always fill up again. As we make our pages faster, some new challenge arises (mobile, rich media ads, emerging markets with poor connectivity) that requires more investigation and new solutions. Some people might find this depressing or daunting. Me? I’m psyched! ‘Scuse me while I roll up my sleeves.

Books
Conferences
EFWS
Performance
Tools
Velocity

Comments (0)

Permalink

Flushing the Document Early

This post is based on a chapter from Even Faster Web Sites, the follow-up to High Performance Web Sites. Posts in this series include: chapters and contributing authors, Splitting the Initial Payload, Loading Scripts Without Blocking, Coupling Asynchronous Scripts, Positioning Inline Scripts, Sharding Dominant Domains, Flushing the Document Early, Using Iframes Sparingly, and Simplifying CSS Selectors.

The Performance Golden Rule reminds us that for most web sites, 80-90% of the load time is on the front end. However, for some pages, the time it takes the back end to generate the HTML document is more than 10-20% of the overall page load time. Even if the HTML document is generated quickly on the back end, it may still take a long time before it’s received by the browser for users in remote locations or with slow connections. While the HTML document is being generated on the back end and sent over the wire, the browser is waiting idly. What a waste! Instead of letting the browser sit there doing nothing, web developers can use flushing to jumpstart page loading even before the HTML document response is fully received.

Flushing is when the server sends the initial part of the HTML document to the client before the entire response is ready. All major browsers start parsing the partial response. When done correctly, flushing results in a page that loads and feels faster. The key is choosing the right point at which to flush the partial HTML document response. The flush should occur before the expensive parts of the back end work, such as database queries and web service calls. But the flush should occur after the initial response has enough content to keep the browser busy. The part of the HTML document that is flushed should contain some resources as well as some visible content. If resources (e.g., stylesheets, external scripts, and images) are included, the browser gets an early start on its download work. If some visible content is included, the user receives feedback sooner that the page is loading.

Most HTML templating languages, including PHP, Perl, Python, and Ruby, contain a “flush” function. Getting flushing to work can be tricky. Problems arise when output is buffered, chunked encoding is disabled, proxies or anti-virus software interfere, or the flushed response is too small. Scanning the comments in PHP’s flush documentation shows that it can be hard to get all the details correct. Perhaps that’s why most of the U.S. top 10 sites don’t flush the document early. One that does is Google Search.

Google Search flushing early

Google Search flushing early

The HTTP waterfall chart for Google Search shows the benefits of flushing the document early. While the HTML document response (the first bar) is still being received, the browser has already started downloading one of the images used in the page, nav_logo4.png (the second bar). By flushing the document early, you make your pages start downloading resources and rendering content more quickly. This is a benefit that all users will appreciate, especially those with slow Internet connections and high latency.

Books
EFWS
Performance

Comments (20)

Permalink

Sharding Dominant Domains

This post is based on a chapter from Even Faster Web Sites, the follow-up to High Performance Web Sites. Posts in this series include: chapters and contributing authors, Splitting the Initial Payload, Loading Scripts Without Blocking, Coupling Asynchronous Scripts, Positioning Inline Scripts, Sharding Dominant Domains, Flushing the Document Early, Using Iframes Sparingly, and Simplifying CSS Selectors.

Rule 9 from High Performance Web Sites says reducing DNS lookups makes web pages faster. This is true, but in some situations, it’s worthwhile to take a bunch of resources that are being downloaded on a single domain and split them across multiple domains. I call this domain sharding. Doing this allows more resources to be downloaded in parallel, reducing the overall page load time.

To determine if domain sharding makes sense, you have to see if the page has a dominant domain being used for downloading resources. The following HTTP waterfall chart, for Yahoo.com, is indicative of a site being slowed down by a dominant domain. This waterfall chart shows the page being downloaded in Internet Explorer 7, which only allows two parallel downloads per hostname. The vertical lines show that typically there are only two simultaneous downloads at any given time. Looking at the resource URLs, we see that almost all of them are served from “l.yimg.com.” Sharding these resources across two domains, such as “l1.yimg.com” and “l2.yimg.com”, would cause the resources to download in about half the time.

Most of the U.S. top ten web sites do domain sharding. YouTube uses i1.ytimg.com, i2.ytimg.com, i3.ytimg.com, and i4.ytimg.com. Live Search uses ts1.images.live.com, ts2.images.live.com, ts3.images.live.com, and ts4.images.live.com. Both of these sites are sharding across four domains. What’s the optimal number? Yahoo! released a study that recommends sharding across at least two, but no more than four, domains. Above four, performance actually degrades.

Not all browsers are restricted to just two parallel downloads per hostname. Opera 9+ and Safari 3+ do four downloads per hostname. Internet Explorer 8, Firefox 3, and Chrome 1+ do six downloads per hostname. Sharding across two domains is a good compromise that improves performance in all browsers.

Books
EFWS
Performance
Web Development

Comments (7)

Permalink

Positioning Inline Scripts

This post is based on a chapter from Even Faster Web Sites, the follow-up to High Performance Web Sites. Posts in this series include: chapters and contributing authors, Splitting the Initial Payload, Loading Scripts Without Blocking, Coupling Asynchronous Scripts, Positioning Inline Scripts, Sharding Dominant Domains, Flushing the Document Early, Using Iframes Sparingly, and Simplifying CSS Selectors.

My first three chapters in Even Faster Web Sites (Splitting the Initial Payload, Loading Scripts Without Blocking, and Coupling Asynchronous Scripts), focus on external scripts. But inline scripts block downloads and rendering just like external scripts do. The Inline Scripts Block example contains two images with an inline script between them. The inline script takes five seconds to execute. Looking at the HTTP waterfall chart, we see that the second image (which occurs after the inline script) is blocked from downloading until the inline script finishes executing.

Inline scripts block rendering, just like external scripts, but are worse. External scripts only block the rendering of elements below them in the page. Inline scripts block the rendering of everything in the page. You can see this by loading the Inline Scripts Block example in a new window and count off the five seconds before anything is displayed.

Here are three strategies for avoiding, or at least mitigating, the blocking behavior of inline scripts.

  • move inline scripts to the bottom of the page – Although this still blocks rendering, resources in the page aren’t blocked from downloading.
  • use setTimeout to kick off long executing code – If you have a function that takes a long time to execute, launching it via setTimeout allows the page to render and resources to download without being blocked.
  • use DEFER - In Internet Explorer and Firefox 3.1 or greater, adding the DEFER attribute to the inline SCRIPT tag avoids the blocking behavior of inline scripts.

It’s especially important to avoid placing inline scripts between a stylesheet and any other resources in the page. This will make it as if the stylesheet was blocking the resources that follow from downloading. The reason for this behavior is that all major browsers preserve the order of CSS and JavaScript. The stylesheet has to be fully downloaded, parsed, and applied before the inline script is executed. And the inline script must be executed before the remaining resources can be downloaded. Therefore, resources that follow a stylesheet and inline script are blocked from downloading.

It’s better to explain this with an example. Below is the HTTP waterfall chart for eBay.com. Normally, the stylesheets and the first script would be downloaded in parallel.1 But after the stylesheet, there’s an inline script (with just one line of code). This causes the external script to be blocked from downloading until the stylesheet is received.

Stylesheet followed by inline script blocks downloads on eBay
Stylesheet followed by inline script blocks downloads on eBay

This unnecessary blocking also happens on MSN, MySpace, and Wikipedia. The workaround is to move inline scripts above stylesheets or below other resources. This will increase parallel downloads resulting in a faster page.

1All these resources are on the same domain, so downloading them all in parallel is possible on browsers that support more than two connections per server, including Internet Explorer 8, Firefox 3, Safari, Chrome, and Opera.

Books
EFWS
JavaScript
Performance
Web Development

Comments (9)

Permalink

Coupling asynchronous scripts

This post is based on a chapter from Even Faster Web Sites, the follow-up to High Performance Web Sites. Posts in this series include: chapters and contributing authors, Splitting the Initial Payload, Loading Scripts Without Blocking, Coupling Asynchronous Scripts, Positioning Inline Scripts, Sharding Dominant Domains, Flushing the Document Early, Using Iframes Sparingly, and Simplifying CSS Selectors.

Much of my recent work has been around loading external scripts asynchronously. When scripts are loaded the normal way (<script src="...">) they block all other downloads in the page and any elements below the script are blocked from rendering. This can be seen in the Put Scripts at the Bottom example. Loading scripts asynchronously avoids this blocking behavior resulting in a faster loading page.

One issue with async script loading is dealing with inline scripts that use symbols defined in the external script. If the external script is loading asynchronously without thought to the inlined code, race conditions may result in undefined symbol errors. It’s necessary to ensure that the async external script and the inline script are coupled in such a way that the inlined code isn’t executed until after the async script finishes loading.

There are a few ways to couple async scripts with inline scripts.

  • window’s onload – The inlined code can be tied to the window’s onload event. This is pretty simple to implement, but the inlined code won’t execute as early as it could.
  • script’s onreadystatechange – The inlined code can be tied to the script’s onreadystatechange and onload events.  (You need to implement both to cover all popular browsers.) This code is lengthier and more complex, but ensures that the inlined code is called as soon as the script finishes loading.
  • hardcoded callback – The external script can be modified to explicitly kickoff the inlined script through a callback function. This is fine if the external script and inline script are being developed by the same team, but doesn’t provide the flexibility needed to couple 3rd party scripts with inlined code.

In this blog post I talk about two parallel (no pun intended) issues: how async scripts make the page load faster, and how async scripts and inline scripts can be coupled using a variation of John Resig’s Degrading Script Tags pattern. I illustrate these through a recent project I completed to make the UA Profiler results sortable. I did this using Stuart Langridge’s sorttable script. It took me ~5 minutes to add his script to my page and make the results table sortable. With a little more work I was able to speed up the page by more than 30% by using the techniques of async script loading and coupling async scripts.

Normal Script Tags

I initially added Stuart Langridge’s sorttable script to UA Profiler in the typical way (<script src="...">), as seen in the Normal Script Tag implementation. The HTTP waterfall chart is shown in Figure 1.

Normal Script Tags
Figure 1: Normal Script Tags HTTP waterfall chart

The table sorting worked, but I wasn’t happy with how it made the page slower. In Figure 1 we see how my version of the script (called “sorttable-async.js”) blocks the only other HTTP request in the page (“arrow-right-20×9.gif”), which makes the page load more slowly. These waterfall charts were captured using Firebug 1.3 beta. This new version of Firebug draws a vertical red line where the onload event occurs. (The vertical blue line is the domcontentloaded event.) For this Normal Script Tag version, onload fires at 487 ms.

Asynchronous Script Loading

The “sorttable-async.js” script isn’t necessary for the initial rendering of the page – sorting columns is only possible after the table has been rendered. This situation (external scripts that aren’t used for initial rendering) is a prime candidate for asynchronous script loading. The Asynchronous Script Loading implementation loads the script asynchronously using the Script DOM Element approach:

var script = document.createElement('script');
script.src = "sorttable-async.js";
script.text = "sorttable.init()"; // this is explained in the next section
document.getElementsByTagName('head')[0].appendChild(script);

The HTTP waterfall chart for this Asynchronous Script Loading implementation is shown in Figure 2. Notice how using an asynchronous loading technique avoids the blocking behavior – “sorttable-async.js” and “arrow-right-20×9.gif” are loaded in parallel. This pulls in the onload time to 429 ms.

Async Script Loading
Figure 2: Asynchronous Script Loading HTTP waterfall chart

John Resig’s Degrading Script Tags Pattern

The Asynchronous Script Loading implementation makes the page load faster, but there is still one area for improvement. The default sorttable implementation bootstraps itself by attaching “sorttable.init()” to the onload handler. A performance improvement would be to call “sorttable.init()” in an inline script to bootstrap the code as soon as the external script was done loading. In this case, the “API” I’m using is just one function, but I wanted to try a pattern that would be flexible enough to support a more complex situation where the module couldn’t assume what API was going to be used.

I previously listed various ways that an inline script can be coupled with an asynchronously loaded external script: window’s onload, script’s onreadystatechange, and hardcoded callback. Instead, I used a technique derived from John Resig’s Degrading Script Tags pattern. John describes how to couple an inline script with an external script, like this:

<script src="jquery.js">
jQuery("p").addClass("pretty");
</script>

The way his implementation works is that the inlined code is only executed after the external script is done loading. There are several benefits to coupling inline and external scripts this way:

  • simpler – one script tag instead of two
  • clearer – the inlined code’s dependency on the external script is more obvious
  • safer – if the external script fails to load, the inlined code is not executed, avoiding undefined symbol errors

It’s also a great pattern to use when the external script is loaded asynchronously. To use this technique, I had to change both the inlined code and the external script. For the inlined code, I added the third line shown above that sets the script.text property. To complete the coupling, I added this code to the end of “sorttable-async.js”:

var scripts = document.getElementsByTagName("script");
var cntr = scripts.length;
while ( cntr ) {
    var curScript = scripts[cntr-1];
    if ( -1 != curScript.src.indexOf('sorttable-async.js') ) {
        eval( curScript.innerHTML );
        break;
    }
    cntr--;
}

This code iterates over all scripts in the page until it finds the script block that loaded itself (in this case, the script with src containing “sorttable-async.js”). It then evals the code that was added to the script (in this case, “sorttable.init()”) and thus bootstraps itself. (A side note: although the line of code was added using the script’s text property, here it’s referenced using the innerHTML property. This is necessary to make it work across browsers.) With this optimization, the external script loads without blocking other resources, and the inlined code is executed as soon as possible.

Lazy Loading

The load time of the page can be improved even more by lazyloading this script (loading it dynamically as part of the onload handler). The code behind this Lazyload version just wraps the previous code within the onload handler:

window.onload = function() {
    var script = document.createElement('script');
    script.src = "sorttable-async.js";
    script.text = "sorttable.init()";
    document.getElementsByTagName('head')[0].appendChild(script);
}

This situation absolutely requires this script coupling technique. The previous bootstrapping code that called “sorttable.init()” in the onload handler won’t be called here because the onload event has already passed. The benefit of lazyloading the code is that the onload time occurs even sooner, as shown in Figure 3. The onload event, indicated by the vertical red line, occurs at ~320 ms.

Lazyloading
Figure 3: Lazyloading HTTP waterfall chart

Conclusion

Loading scripts asynchronously and lazyloading scripts improve page load times by avoiding the blocking behavior that scripts typically cause. This is shown in the different versions of adding sorttable to UA Profiler:

The times above indicate when the onload event occurred. For other web apps, improving when the asynchronously loaded functionality is attached might be a higher priority. In that case, the Asynchronous Script Loading version is slightly better (~400 ms versus 417 ms). In both cases, being able to couple inline scripts with the external script is a necessity. The technique shown here is a way to do that while also improving page load times.

Books
EFWS
JavaScript
Performance
Web Development

Comments (13)

Permalink

The Art of Capacity Planning

This week I opened a beautiful package from O’Reilly. It contained John Allspaw’s new book, The Art of Capacity Planning. As you can see, the cover is a delight to look at. But you shouldn’t judge a book by it’s cover! Luckily, what’s inside the book is also a delight.

Now, let me make a disclaimer. I know John. We first crossed paths at Yahoo!, and have worked together on some side projects, most notably the Velocity conference. I know he’s an expert in the area of capacity planning. I know he’s highly regarded by other leaders in the operations space. I’ve heard him speak at conferences, brainstorm in small group discussions, and share his experiences with others. Given this background, I expected a book full of lessons learned, practical advice, and real world takeaways. Thankfully, for all of us readers, The Art of Capacity Planning delivers all of this and more.

Right out of the gate, John covers a topic near and dear to my heart: metrics. His advice? “Measure, measure, measure.” John reinforces this by including an incredible number of charts throughout the book. He goes on to say that our measurement tools need to provide an easy way to:

  • Record and store data over time
  • Build custom metrics
  • Compare metrics from various sources
  • Import and export metrics

As I read the book, I found myself nodding and thinking, “yes, yes, this is exactly what I learned!” Although it’s been more than five years since I was buildmaster for My Yahoo!, I really resonated with the advice John provides, like this one: “Homogenize hardware to halt headaches”. (You have to love the alliteration, too.)

In a thin book that’s easy to read, John covers a large number of topics. He talks about load testing, with pointers to tools like Httperf and Siege. There are several sections that talk about caching architectures and the use of Squid. He provides guidelines when it comes to deployment, such as making all changes happen in one place, the importance of defining roles and services, and ensuring new servers start working automatically. At the end he even manages to cover virtualization and cloud computing, and how they come into play during capacity planning.

The Art of Capacity Planning is full of sage advice from a seasoned veteran, like this one: “The moral of this little story? When faced with the question of capacity, try to ignore those urges to make existing gear faster, and focus instead on the topic at hand: finding out what you need, and when.” When I read a technical book, I’m really looking for takeaways. That’s why I loved The Art of Capacity Planning, and I think you will, too.

Books
Performance

Comments (0)

Permalink

Splitting the Initial Payload

This post is based on a chapter from Even Faster Web Sites, the follow-up to High Performance Web Sites. Posts in this series include: chapters and contributing authors, Splitting the Initial Payload, Loading Scripts Without Blocking, Coupling Asynchronous Scripts, Positioning Inline Scripts, Sharding Dominant Domains, Flushing the Document Early, Using Iframes Sparingly, and Simplifying CSS Selectors.

The growing adoption of Ajax and DHTML means today’s web pages have more JavaScript than ever before. The average top ten U.S. web site[1] contains 252K of JavaScript. JavaScript slows pages down. When the browser starts downloading or executing JavaScript it won’t start any other parallel downloads. Also, anything below an external script is not rendered until the script is completely downloaded and executed. Even in the case where external scripts are cached the execution time can still slow down the user experience and thwart progressive rendering.

Every line of JavaScript code matters to a fast loading page. And yet, as shown in Table 1, the average top ten U.S. web site only executes 25% of the JavaScript functionality before the onload event.[2] Why track this relative to the onload event? Any code needed after the onload event (for dropdown menus, Ajax requests, etc.) could be downloaded later, making the initial page render more quickly.

Continue Reading »

Books
Conferences
JavaScript
Performance

Comments (13)

Permalink