Velocity: Google Maps API performance

July 7, 2010 1:09 pm | 1 Comment

Several months ago I saw Susannah Raub do an internal tech talk on the performance improvements behind Google Maps API v3. She kindly agreed to reprise the talk at Velocity. Luckily it was videotaped, and the slides (ODP) are available, too. It’s a strong case study on improving performance, is valuable for developers working with the Google Maps API, and has a few takeaways that I’ll blog about more soon.

Susannah starts off bravely by showing how Google Maps API v2 takes 17 seconds to load on an iPhone. This was the motivation for the work on v3 – to improve performance. In order to improve performance you have to start by measuring it. The Google Maps team broke down “performance” into three categories:

  • user perceived latency – how long it takes for the page to appear usable, in this case for the map to be rendered
  • page ready time – how long it takes for the page to become usable, e.g. for the map to be draggable
  • page load time – how long it takes for all the elements to be present, in the case of maps this includes all of the map controls to be loaded and working

The team wanted to measure all of these areas. It’s fairly easy to find tools to measure performance on the desktop – the Google Maps teamed used HttpWatch. Performance tools, or any development tools for that matter, are harder to come by in the mobile space. But the team especially wanted to focus on creating a fast experience on mobile devices. They ended up using Fiddler as a proxy to gain visibility into the page’s performance profile.

future blog post #1: Coincidentally, today I saw a tweet about Craig Dunn’s instructions for Monitoring iPhone web traffic (with Fiddler). This is a huge takeaway for anyone doing web development for mobile. At Velocity, Eric Lawrence (creator of Fiddler) announced Fiddler support for the HTTP Archive Specification. The HTTP Archive (HAR) format is a specification I initiated over a year ago with folks from HttpWatch and Firebug. HAR is becoming the industry standard just as I had hoped and is now supported in numerous developer tools. I wrote one such tool, called HAR to Page Speed, that takes a HAR file and displays a Page Speed performance analysis as well as an HTTP waterfall chart. Putting all these pieces together, you can now load a web site on your iPhone, monitor it with Fiddler, export it to a HAR file, and upload it to HAR to Page Speed to find out how it performs. Given Fiddler’s extensive capabilities for creating addons, I expect it won’t be long before all of this is built into Fiddler itself.

In the case of Google Maps API, the long pole in the tent was main.js. They have a small (15K) bootstrap script that loads main.js (180K). (All of the script sizes in this blog post are UNcompressed sizes.) The performance impact of main.js was especially bad on mobile devices because of less caching. They compiled their JavaScript (using Closure) and combined three HTTP requests into one.

future blog post #2: The team also realized that although their JavaScript download was large, the revisions between releases was small. They created a framework for only downloading deltas when possible that cut seconds off their download times. More on this tomorrow.

These performance improvements helped, but they wanted to go further. They redesigned their code using an MVC architecture. As a result, the initial download only needs to include the models, which are small. The larger views and controllers that do all the heavy lifting are loaded asynchronously. This reduced the initial bootstrap script from 15K to 4K, and the main.js from 180K to 33K.

The results speak for themselves. Susannah concludes by showing how v3 of Google Maps API takes only 5 seconds to load on the iPhone, compared to v2’s 17 seconds. The best practices the team employed for making Google Maps faster are valuable for anyone working on JavaScript-heavy web sites. Take a look at the video and slides, and watch here for a follow-up on Fiddler for iPhone and loading JavaScript deltas.

1 Comment

Velocity: Top 5 Mistakes of Massive CSS

July 3, 2010 12:22 pm | 3 Comments

Nicole Sullivan and Stoyan Stefanov had the #3 highest rated session at Velocity – The Top 5 Mistakes of Massive CSS. Nicole (aka, “stubbornella”) wrote a blog post summarizing their work. The motivator for paying attention to CSS are these stats that show how bad things are across the Alexa Top 1000:

  • 42% don’t GZIP CSS
  • 44% have more than 2 CSS external files
  • 56% serve CSS with cookies
  • 62% don’t minify
  • 21% have greater than 100K of CSS

Many of these problems are measured by YSlow and Page Speed, but the solutions still aren’t widely adopted. Nicole goes on to highlight more best practices for reducing the impact of CSS including minimizing float and using a reset stylesheet.

Checkout the slides and video of Nicole and Stoyan’s talk to learn how to avoid having CSS block your page from rendering.

Choose Your Own Adventure Adam Jacob Opscode
TCP and the Lower Bound of Web Performance John Rauser Amazon
The Top 5 Mistakes of Massive CSS Nicole Sullivan Consultant
Building Performance Into the New Yahoo! Homepage Nicholas Zakas Yahoo!
Hidden Scalability Gotchas in Memcached and Friends Neil Gunther Performance Dynamics Company
Internet Explorer 9 Jason Weber Microsoft
Creating Cultural Change John Rauser Amazon
Scalable Internet Architectures Theo Schlossnagle OmniTI
Ignite Velocity Andrew Shafer Cloudscaling
The Upside of Downtime: How to Turn a Disaster Into an Opportunity Lenny Rachitsky Webmetrics/Neustar
Metrics 101: What to Measure on Your Website Sean Power Watching Websites
The 90-Minute Optimization Life Cycle: Fast by Default Before Our Eyes? Joshua Bixby Strangeloop Networks
Progressive Enhancement: Tools and Techniques Anne Sullivan Google
Chrome Fast. Mike Belshe Google

3 Comments

Back to blogging after Velocity

July 2, 2010 12:04 pm | 3 Comments

The last few weeks have been hectic. I was in London and Paris for 10 days. I returned a day before Velocity started. Most of you experienced or have heard about the awesomeness that was Velocity – great speakers, sponsors, and attendees. Right after Velocity I headed up to Foo Camp at O’Reilly HQ. This week I’ve been catching up on all the email that accumulated over three weeks.

During this time blogging has taken a backseat. But now that my head is above water I want to start relaying some of the key takeaways from Velocity. I wrote my Velocity wrap-up and mentioned my favorite sessions. But here are the top 10 sessions based on the attendee ratings:

  1. Choose Your Own Adventure by Adam Jacob, Opscode (unofficial video snippets)
  2. TCP and the Lower Bound of Web Performance by John Rauser, Amazon (slides)
  3. The Top 5 Mistakes of Massive CSS by Nicole Sullivan, consultant and Stoyan Stefanov, Yahoo! (video)
  4. Building Performance Into the New Yahoo! Homepage by Nicholas Zakas, Yahoo! (slides)
  5. Hidden Scalability Gotchas in Memcached and Friends by Neil Gunther Performance Dynamics and Shanti Subramanyam and Stefan Parvu, Oracle (video)
  6. Internet Explorer 9 by Jason Weber, Microsoft (slides)
  7. Creating Cultural Change by John Rauser, Amazon (video)
  8. Scalable Internet Architectures by Theo Schlossnagle, OmniTI (slides)
  9. The Upside of Downtime: How to Turn a Disaster Into an Opportunity by Lenny Rachitsky, Webmetrics/Neustar (video, slides)
  10. Tied for #10:
    1. Metrics 101: What to Measure on Your Website by Sean Power, Watching Websites (slides)
    2. The 90-Minute Optimization Life Cycle: Fast by Default Before Our Eyes? by Joshua Bixby and Hooman Beheshti, Strangeloop Networks
    3. Progressive Enhancement: Tools and Techniques by Anne Sullivan, Google (slides)
    4. Chrome Fast. by Mike Belshe, Google (slides)

Some things to highlight: Adam Jacob is an incredible speaker – insightful and funny. John Rauser is the speaker I enjoyed the most – he shows up twice at #2 and #7. Two of the browser presentations registered. The workshops this year were incredible and very well attended – four of them registered in the top 10 (#8, #10a, #10b, and #10c). Annie Sullivan rated high and it was her first time speaking at a conference.

The last two years at Velocity we’ve only been able to videotape the talks in one room, so this year that means about a third of the talks were videotaped. Four of these top rated sessions were taped. Next year I’ll try to get more of the top speakers in the video room. I’ve asked the five speakers without slides to upload them to the Velocity web site. Check back next week if you want those.

I actually feel electricity running up and down my spine looking over these talks. To think I had something to do with pulling these gurus together and offering a place for them to share what they know – it’s humbling and exhilarating at the same time. I’ll be doing some more Velocity-related posts on specific sessions next week, so stay tuned.

Choose Your Own Adventure Adam Jacob Opscode
TCP and the Lower Bound of Web Performance John Rauser Amazon
The Top 5 Mistakes of Massive CSS Nicole Sullivan Consultant
Building Performance Into the New Yahoo! Homepage Nicholas Zakas Yahoo!
Hidden Scalability Gotchas in Memcached and Friends Neil Gunther Performance Dynamics Company
Internet Explorer 9 Jason Weber Microsoft
Creating Cultural Change John Rauser Amazon
Scalable Internet Architectures Theo Schlossnagle OmniTI
Ignite Velocity Andrew Shafer Cloudscaling
The Upside of Downtime: How to Turn a Disaster Into an Opportunity Lenny Rachitsky Webmetrics/Neustar
Metrics 101: What to Measure on Your Website Sean Power Watching Websites
The 90-Minute Optimization Life Cycle: Fast by Default Before Our Eyes? Joshua Bixby Strangeloop Networks
Progressive Enhancement: Tools and Techniques Anne Sullivan Google
Chrome Fast. Mike Belshe Google

3 Comments

Velocity is coming fast June 22-24

June 4, 2010 1:09 am | 1 Comment

Jesse Robbins and I co-chair Velocity – the web performance and operations conference run by O’Reilly. This year’s Velocity is coming fast (get it?) – June 22-24 at the Santa Clara Convention Center. This is the third year for Velocity. The first two years sold out, and this year is looking even stronger. We’ve added a third track so that’s 50% more workshops and sessions. That means more gurus to talk to and more topics to choose from.

Jesse did a post today about the ops side of the conference. Here are some of my favorites from the web performance track:

  • Mobile Web High Performance – This workshop (workshops are on Tues June 22) is by O’Reilly author Maximiliano Firtman. Mobile is big and only a few people (including Maximiliano) know the performance side of mobile. His book? Programming the Mobile Web
  • Progressive Enhancement: Tools and Techniques – The most important pattern I recommend for today’s web sites is to render the page quickly and adorn later with JavaScript. Some of the more advanced web apps are doing this, but otherwise it’s not a well known pattern. Annie is one of my favorite performance developers at Google. She has built sites that do progressive enhancement, so I’m super psyched that she agreed to give this workshop. Very important for anyone with a bunch of JavaScript in their site.
  • Building Performance Into the New Yahoo! Homepage – Nicholas Zakas, JavaScript performance guru, talks about the real world story of making Yahoo! front page twice as fast.
  • The Top 5 Mistakes of Massive CSS – Nicole Sullivan (consultant) and Stoyan Stefanov (Yahoo!) share their lessons learned optimizing the CSS for Facebook and Yahoo! Search.
  • The Firefox, Chrome, and Internet Explorer teams will be there to talk about the latest performance improvements to their browsers. That’s followed by the Browser Panel where you get to ask more questions.
  • Lightning Demos on Wed and Thurs will give everyone a chance to see dynaTrace, Firebug, YSlow, Page Speed, HttpWatch, AOL (Web)Pagetest, Speed Tracer, and Fiddler.
  • We have an amazing line-up of keynoters: Wednesday morning features James Hamilton (Amazon), Urs Hölzle (Google), and Tim O’Reilly (O’Reilly Media). All in one morning! Thursday brings back John Adams (Twitter) and Bobby Johnson (Facebook). Their Velocity 2009 talks were standing room only.

I’m looking forward to all the talks and catching up with the speakers. I’m most excited about the hallway conversations. It’s great hearing about what other developers have discovered during their own performance optimization projects. I especially enjoy how accessible the speakers are. It’s amazing how willing everyone is to share what they’ve learned and to work together to advance the state of web performance and operations. After all, that’s what Velocity is all about.

1 Comment

Frontend SPOF

June 1, 2010 7:49 pm | 9 Comments

My evangelism of high performance web sites started off in the context of quality code and development best practices. It’s easy for a style of coding to permeate throughout a company. Developers switch teams. Code is copied and pasted (especially in the world of web development). If everyone is developing in a high performance way, that’s the style that will characterize how the company codes.

This argument of promoting development best practices gained traction in the engineering quarters of the companies I talked to, but performance improvements continued to get backburnered in favor of new features and content that appealed to the business side of the organization. Improving performance wasn’t considered as important as other changes. Everyone assumed users wanted new features and that’s what got the most attention.

It became clear to me that we needed to show a business case for web performance. That’s why the theme for Velocity 2009 was “the impact of performance on the bottom line”. Since then there have been numerous studies released that have shown that improving performance does improve the bottom line. As a result, I’m seeing the business side of many web companies becoming strong advocates for Web Performance Optimization.

But there are still occasions when I have a hard time convincing a team that focusing on web performance, specifically frontend performance, is important. Shaving off hundreds (or even thousands) of milliseconds just doesn’t seem worthwhile to them. That’s when I pull out the big guns and explain that loading scripts and stylesheets in the typical way creates a frontend single point of failure that can bring down the entire site.

Examples of Frontend SPOF

The thought that simply adding a script or stylesheet to your web page could make the entire site unavailable surprises many people. Rather than focusing on CSS mistakes and JavaScript errors, the key is to think about what happens when a resource request times out. With this clue, it’s easy to create a test case:

<html>
<head>
<script src="http://www.snippet.com/main.js" type="text/javascript">
  </script>
</head>
<body>
Here's my page!
</body>
</html>

This HTML page looks pretty normal, but if snippet.com is overloaded the entire page is blank waiting for main.js to return. This is true in all browsers.

Here are some examples of frontend single points of failure and the browsers they impact. You can click on the Frontend SPOF test links to see the actual test page.

Frontend SPOF test Chrome Firefox IE Opera Safari
External Script blank below blank below blank below blank below blank below
Stylesheet flash flash blank below flash blank below
inlined @font-face delayed flash flash flash delayed
Stylesheet with @font-face delayed flash totally blank* flash delayed
Script then @font-face delayed flash totally blank* flash delayed

* Internet Explorer 9 does not display a blank page, but does “flash” the element.

The failure cases are highlighted in red. Here are the four possible outcomes sorted from worst to best:

  • totally blank – Nothing in the page is rendered – the entire page is blank.
  • blank below – All the DOM elements below the resource in question are not rendered.
  • delayed – Text that uses the @font-face style is invisible until the font file arrives.
  • flash – DOM elements are rendered immediately, and then redrawn if necessary after the stylesheet or font has finished downloading.

Web Performance avoids SPOF

It turns out that there are web performance best practices that, in addition to making your pages faster, also avoid most of these frontend single points of failure. Let’s look at the tests one by one.

External Script 
All browsers block rendering of elements below an external script until the script arrives and is parsed and executed. Since many sites put scripts in the HEAD, this means the entire page is typically blank. That’s why I believe the most important web performance coding pattern for today’s web sites is to load JavaScript asynchronously. Not only does this improve performance, but it avoids making external scripts a possible SPOF. 
Stylesheet 
Browsers are split on how they handle stylesheets. Firefox and Opera charge ahead and render the page, and then flash the user if elements have to be redrawn because their styling changed. Chrome, Internet Explorer, and Safari delay rendering the page until the stylesheets have arrived. (Generally they only delay rendering elements below the stylesheet, but in some cases IE will delay rendering everything in the page.) If rendering is blocked and the stylesheet takes a long time to download, or times out, the user is left staring at a blank page. There’s not a lot of advice on loading stylesheets without blocking page rendering, primarily because it would introduce the flash of unstyled content.
inlined @font-face 
I’ve blogged before about the performance implications of using @font-face. When the @font-face style is declared in a STYLE block in the HTML document, the SPOF issues are dramatically reduced. Firefox, Internet Explorer, and Opera avoid making these custom font files a SPOF by rendering the affected text and then redrawing it after the font file arrives. Chrome and Safari don’t render the customized text at all until the font file arrives. I’ve drawn these cells in yellow since it could cause the page to be unusable for users using these browsers, but most sites only use custom fonts on a subset of the page.
Stylesheet with @font-face 
Inlining your @font-face style is the key to avoiding having font files be a single point of failure. If you inline your @font-face styles and the font file takes forever to return or times out, the worst case is the affected text is invisible in Chrome and Safari. But at least the rest of the page is visible, and everything is visible in Firefox, IE, and Opera. Moving the @font-face style to a stylesheet not only slows down your site (by requiring two sequential downloads to render text), but it also creates a special case in Internet Explorer 7 & 8 where the entire page is blocked from rendering. IE 6 is only slightly better – the elements below the stylesheet are blocked from rendering (but if your stylesheet is in the HEAD this is the same outcome).
Script then @font-face 
Inlining your @font-face style isn’t enough to avoid the entire page SPOF that occurs in IE. You also have to make sure the inline STYLE block isn’t preceded by a SCRIPT tag. Otherwise, your entire page is blank in IE waiting for the font file to arrive. If that file is slow to return, your users are left staring at a blank page.

SPOF is bad

Five years ago most of the attention on web performance was focused on the backend. Since then we’ve learned that 80% of the time users wait for a web page to load is the responsibility of the frontend. I feel this same bias when it comes to identifying and guarding against single points of failure that can bring down a web site – the focus is on the backend and there’s not enough focus on the frontend. For larger web sites, the days of a single server, single router, single data center, and other backend SPOFs are way behind us. And yet, most major web sites include scripts and stylesheets in the typical way that creates a frontend SPOF. Even more worrisome – many of these scripts are from third parties for social widgets, web analytics, and ads.

Look at the scripts, stylesheets, and font files in your web page from a worst case scenario perspective. Ask yourself:

  • Is your web site’s availability dependent on these resources?
  • Is it possible that if one of these resources timed out, users would be blocked from seeing your site?
  • Are any of these single point of failure resources from a third party?
  • Would you rather embed resources in a way that avoids making them a frontend SPOF?

Make sure you’re aware of your frontend SPOFs, track their availability and latency closely, and embed them in your page in a non-blocking way whenever possible.

Update Oct 12: Pat Meenan created a blackhole server that you can use to detect frontend SPOF in webpages.

9 Comments

appendChild vs insertBefore

May 11, 2010 12:15 am | 43 Comments

I’ve looked at a bunch of third party JavaScript snippets as part of my P3PC series. As I analyzed each of these snippets, I looked to see if scripts were being loaded dynamically. After all, this is a key ingredient for making third party content fast. It turns out nobody does dynamic loading the same way. I’d like to walk through some of the variations I found. It’s a story that touches on some of the most elegant and awful code out there, and is a commentary on the complexities of dealing with the DOM.

In early 2008 I started gathering techniques for loading scripts without blocking. I called the most popular technique the Script DOM Element approach. It’s pretty straightforward:

var domscript = document.createElement('script');
domscript.src = 'main.js';
document.getElementsByTagName('head')[0].appendChild(domscript);
Souders, May 2008

I worked with the Google Analytics team on their async snippet. The first version that came out in December 2009 also used appendChild, but instead of trying to find the HEAD element, they used a different technique for finding the parent. It turns out that not all web pages have a HEAD tag, and not all browsers will create one when it’s missing.

var ga = document.createElement('script');
ga.src = ('https:' == document.location.protocol ?
    'https://ssl' : 'http://www') +
    '.google-analytics.com/ga.js';
ga.setAttribute('async', 'true');
document.documentElement.firstChild.appendChild(ga);
Google Analytics, Dec 2009

Google Analytics is used on an incredibly diverse set of web pages, so there was lots of feedback that identified issues with using documentElement.firstChild. In February 2010 they updated the snippet with this pattern:

var ga = document.createElement('script');
ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 
    'https://ssl' : 'http://www') + 
    '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(ga, s);
Google Analytics, Feb 2010

I think this is elegant. If we’re dynamically loading scripts, we’re doing that with JavaScript, so there must be at least one SCRIPT element in the page. The Google Analytics async snippet has just come out of beta, so this pattern must be pretty rock solid.

I wanted to see how other folks were loading dynamic scripts, so I took a look at YUI Loader. It has an insertBefore variable that is used for stylesheets, so for scripts it does appendChild to the HEAD element:

if (q.insertBefore) {
  var s = _get(q.insertBefore, id);
  if (s) {
    s.parentNode.insertBefore(n, s);
  }
} else {
  h.appendChild(n);
}
YUI Loader 2.6.0, 2008

jQuery supports dynamic resource loading. Their code is very clean and elegant, and informative, too. In two pithy comments are pointers to bugs #2709 and #4378 which explain the issues with IE6 and appendChild.

head = document.getElementsByTagName ("head")[0] || 
    document.documentElement;
// Use insertBefore instead of appendChild to circumvent an IE6 bug.
// This arises when a base node is used (#2709 and #4378).
head.insertBefore(script, head.firstChild);
jQuery

All of these implementations come from leading development teams, but what’s happening in other parts of the Web? Here’s a code snippet I came across while doing my P3PC Collective Media blog post:

var f=document.getElementsByTagName("script");
var b=f[f.length-1]; 
if(b==null){ return; }
var i=document.createElement("script");
i.language="javascript"; 
i.setAttribute("type","text/javascript");
var j=""; 
j+="document.write('');";
var g=document.createTextNode(j); 
b.parentNode.insertBefore(i,b);
appendChild(i,j);

function appendChild(a,b){
  if(null==a.canHaveChildren||a.canHaveChildren){
    a.appendChild(document.createTextNode(b));
  }
  else{ a.text=b;}
}
Collective Media, Apr 2010

Collective Media starts out in a similar way by creating a SCRIPT element. Similar to Google Analytics, it gets a list of SCRIPT elements already in the page, and chooses the last one in the list. Then insertBefore is used to insert the new dynamic SCRIPT element into the document.

Normally, this is when the script would start downloading (asynchronously), but in this case the src hasn’t been set. Instead, the script’s URL has been put inside a string of JavaScript code that does a document.write of a SCRIPT HTML tag. (If you weren’t nervous before, you should be now.) (And there’s more.) Collective Media creates a global function called, of all things, appendChild. The dynamic SCRIPT element and string of document.write code are passed to this custom version of appendChild, which injects the string of code into the SCRIPT element, causing it to be executed. The end result, after all this work, is an external script that gets downloaded in a way that blocks the page. It’s not even asynchronous!

I’d love to see Collective Media clean up their code. They’re so close to making it asynchronous and improving the page load time of anyone who includes their ads. But really, doesn’t this entire blog post seem surreal? To be discussing this level of detail and optimization for something as simple as adding a script element dynamically is a testimony to the complexity and idiosyncrasies of the DOM.

In threads and discussions about adding simpler behavior to the browser, a common response I hear from browser developers is, “But site developers can do that now. We don’t have to add a new way of doing it.” Here we can see what happens without that simpler behavior. Hundreds, maybe even thousands of person hours are spent reinventing the wheel for some common task. And some dev teams end up down a bad path. That’s why I’ve proposed some clarifications to the ASYNC and DEFER attributes for scripts, and a new POSTONLOAD attribute.

I’m hopeful that HTML5 will include some simplifications for working with the DOM, especially when it comes to improving performance. Until then, if you’re loading scripts dynamically, I recommend using the latest Google Analytics pattern or the jQuery pattern. They’re the most bulletproof. And with the kinds of third party content I’ve seen out there, we need all the bulletproofing we can get.

43 Comments

WPO – Web Performance Optimization

May 7, 2010 12:35 am | 14 Comments

Everybody loves web performance

When I started evangelizing high performance web sites back in 2004, I felt like a lone voice in the woods. Fast forward six years to Fred Wilson speaking at the Future of Web Apps. Fred is a (the) top tech VC from NYC with investments in companies such as Twitter, del.icio.us, Etsy, and FeedBurner. He spoke about the 10 Golden Principles of Successful Web Apps. Guess what was #1 on his list?

First and foremost, we believe that speed is more than a feature. Speed is the most important feature. If your application is slow, people won’t use it. […]

We think that the application has to be fast, and if it’s not, you can see what happens. We have every single one of our portfolio company services on Pingdom, and we take a look at that every week. When we see some of our portfolio company’s applications getting bogged down, we also note that they don’t grow as quickly. There is real empirical evidence that substantiates the fact that speed is more than a feature. It’s a requirement.

What started as a list of performance tips coded up in a browser plug-in has evolved to the point where a “leading voice of the venture capital finance community in the nation’s largest city” is citing speed as the #1 principle for successful web apps.

Impact of performance on the bottom line

This is confirmation that what we set as the theme for Velocity 2009 – “the impact of performance on the bottom line” – was timely and impactful. I suggested that theme because after years of evangelizing web performance to the tech community I realized we needed to reach other parts of the organization (managements, sales, marketing, etc.) to get support for the work needed to make web sites fast. Here are some of the now well known performance success stories that came from Velocity 2009 and afterward.

The major search engines measured how much web site slowdowns hurt their business metrics:

On the faster side, companies from a variety of vertical markets had praise for the benefits gained from improving performance:

Google, in their ongoing effort to make the Web faster, blogged last month that “we’ve decided to take site speed into account in our search rankings.” This is yet another way in which improving web performance will have a positive impact on the bottom line.

Web Performance Optimization – an emerging industry

This convergence of awareness, even urgency, on the business side and growing expertise in the tech community around web performance marks the beginning of a new industry that I’m calling “WPO” – Web Performance Optimization. WPO is similar to SEO in that optimizing web performance drives more traffic to your web site. But WPO doesn’t stop there. As evidenced by the success stories mentioned earlier, WPO also improves the user experience, increases revenue, and reduces operating costs.

Having just announced this new industry, let me be the first to give my predictions on what we’ll see in the near future. Here’s my top ten list done in Letterman fashion:

  1. Fast by default – To make it easier for developers, we’ll see performance best practices get built in to CMSs, templating languages (PHP, Python, etc.), clouds (AWS, Google App Engine), JavaScript libraries, and most importantly in major conduits of the Web – browsers, servers, and proxies. A lot of this is already happening, such as jQuery’s focus on performance and performance optimizations in Rails.
  2. Visibility into the browser – In order to make web pages faster, developers need the ability to find which parts are slow. This requires visibility into the time it takes for JavaScript execution, applying CSS, repainting, DOM manipulation, and more. We’re seeing early forays into this area with tools like Speed Tracer and dynaTrace Ajax Edition.
  3. Consolidation – Projects around web performance tools, metrics, and services have been disjoint efforts. That’s going to change. We’ll see tools that combine JavaScript debugging, JavaScript profiling, DOM inspection, network utilization, and more – all in one tool. Performance metrics will be aggregated in one dashboard, rather than having to visit multiple separate services. Consolidation will also happen at the company level, where smaller performance-related companies are acquired by larger consulting and services companies.
  4. TCP, HTTP – The network on which the Web works needs to be optimized. SPDY is one proposal. I also think we need to try to get more support for pipelining. Any improvements made to the underlying network will trickle down to every site and user on the Web.
  5. Standards – We’re going to see standards established in the areas of measuring performance, benchmarks, and testing. The Web Timing Spec is one example that exists today.
  6. Industry Organizations – Within the WPO industry we’ll see the growth of professional organizations, training, certification, standards bodies, and cooperatives. An example of a cooperative that came through my inbox today was a proposal for web publishers to share information about slow ads.
  7. Data – Monitoring performance and finding new performance opportunities requires analyzing data. I predict we’ll see public repositories of performance-related data made available. My favorite example that I’d love to see is an Internet Performance Archive, similar to the existing Internet Archive except that the IPA’s wayback machine would show the performance characteristics of a web site over time.
  8. green – Finally we’ll see studies conducted that quantify how improving web performance reduces power consumption and ultimately shrinks the Web’s carbon footprint.
  9. mobile – Mobile performance is at square one. We need to gather metrics, find the major performance pain points and their root causes, discover solutions, create tools, evangelize the newly discovered best practices, and collect new success stories.
  10. speed as a differentiator – Going forward, many of the decisions made around the Web will be based on performance. Customer device purchases, vendor selection, web site reviews, and user loyalty will all include performance as a major consideration.

There’s a lot of work to be done. It’s all going to be interesting and will greatly improve the Web that we use everyday. If you have the interest and time, contact me. There are tons of open source projects that need to be started. I look forward to working with you on making a faster web.


[This blog post is based on my presentation from Web 2.0 Expo. The slides from that talk are available as Powerpoint and on Slideshare.]

14 Comments

Call to improve browser caching

April 26, 2010 9:14 pm | 38 Comments

Over Christmas break I wrote Santa my browser wishlist. There was one item I neglected to ask for: improvements to the browser disk cache.

In 2007 Tenni Theurer and I ran an experiment to measure browser cache stats from the server side. Tenni’s write up, Browser Cache Usage – Exposed, is the stuff of legend. There she reveals that while 80% of page views were done with a primed cache, 40-60% of unique users hit the site with an empty cache at least once per day. 40-60% seems high, but I’ve heard similar numbers from respected web devs at other major sites.

Why do so many users have an empty cache at least once per day?

I’ve been racking my brain for years trying to answer this question. Here are some answers I’ve come up with:

  • first time users – Yea, but not 40-60%.
  • cleared cache – It’s true: more and more people are likely using anti-virus software that clears the cache between browser sessions. And since we ran that experiment back in 2007 many browsers have added options for clearing the cache frequently (for example, Firefox’s privacy.clearOnShutdown.cache option). But again, this doesn’t account for the 40-60% number.
  • flawed experiment – It turns out there was a flaw in the experiment (browsers ignore caching headers when an image is in memory), but this would only affect the 80% number, not the 40-60% number. And I expect the impact on the 80% number is small, given the fact that other folks have gotten similar numbers. (In a future blog post I’ll share a new experiment design I’ve been working on.)
  • resources got evicted – hmmmmm

OK, let’s talk about eviction for a minute. The two biggest influencers for a resource getting evicted are the size of the cache and the eviction algorithm. It turns out, the amount of disk space used for caching hasn’t kept pace with the size of people’s drives and their use of the Web. Here are the default disk cache sizes for the major browsers:

  • Internet Explorer: 8-50 MB
  • Firefox: 50 MB
  • Safari: everything I found said there isn’t a max size setting (???)
  • Chrome: < 80 MB (varies depending on available disk space)
  • Opera: 20 MB

Those defaults are too small. My disk drive is 150 GB of which 120 GB is free. I’d gladly give up 5 GB or more to raise the odds of web pages loading faster.

Even with more disk space, the cache is eventually going to fill up. When that happens, cached resources need to be evicted to make room for the new ones. Here’s where eviction algorithms come into play. Most eviction algorithms are LRU-based – the resource that was least recently used is evicted. However, our knowledge of performance pain points has grown dramatically in the last few years. Translating this knowledge into eviction algorithm improvements makes sense. For example, we’re all aware how much costlier it is to download a script than an image. (Scripts block other downloads and rendering.) Scripts, therefore, should be given a higher priority when it comes to caching.

It’s hard to get access to gather browser disk cache stats, so I’m asking people to discover their own settings and share them via the Browser Disk Cache Survey form. I included this in my talks at JSConf and jQueryConf. ~150 folks at those conferences filled out the form. The data shows that 55% of people surveyed have a cache that’s over 90% full. (Caveats: this is a small sample size and the data is self-reported.) It would be great if you would take time to fill out the form. I’ve also started writing instructions for finding your cache settings.

I’m optimistic about the potential speedup that could result from improving browser caching, and fortunately browser vendors seem receptive (for example, the recent Mozilla Caching Summit). I expect we’ll see better default cache sizes and eviction logic in the next major release of each browser. Until then, jack up your defaults as described in the instructions. And please add comments for any browsers I left out or got wrong. Thanks.

38 Comments

P3PC: Collective Media

April 14, 2010 7:24 am | 8 Comments

P3PC is a project to review the performance of 3rd party content such as ads, widgets, and analytics. You can see all the reviews and stats on the P3PC home page. This blog post looks at Collective Media. Here are the summary stats.

impact on page Page Speed YSlow doc.
write
total reqs total xfer size JS ungzip DOM elems median Δ load time
big 86 90 y 6 8 kB 9 kB 7 na**
* Stats for ads only include the ad framework and not any ad content.
** It’s not possible to gather timing stats for snippets with live ads.
column definitions

I don’t have an account with Collective Media, so my friends over at Zimbio let me use their ad codes during my testing. Since these are live (paying) ads I can’t crowdsource time measurements for these ads.

Snippet Code

Let’s look at the actual snippet code:

1: <script type=”text/javascript” >
2: document.write(unescape(“%3Cscript src=’http://a.collective-media.net/adj/cm.zimbio/picture;sz=300×250;ord=” +  Math.round(Math.random()*10000000) + “‘ type=’text/javascript’%3E%3C/script%3E”));
3: </script>
4: <noscript><a href=”http://a.collective-media.net/jump/cm.zimbio/picture;sz=300×250;ord=[timestamp]?” target=”_blank”><img src=”http://a.collective-media.net/ad/cm.zimbio/picture;sz=300×250;ord=[timestamp]?” width=”300″ height=”250″ border=”0″ alt=””></a></noscript>
snippet code as of April 13, 2010

Lines 1-3 use document.write to insert the a.collective-media.net/adj/cm.zimbio/picture script. Line 4 provides a NOSCRIPT block in case JavaScript is not available.

Performance Analysis

This HTTP waterfall chart was generated by WebPagetest.org using IE 7 with a 1.5Mbps connection from Dulles, VA. In my analysis of ad snippets I focus only on the ad framework, not on the actual ads. The Collective Media ad framework is composed of 6 HTTP requests: items 2, 3, 4, 5, 11&12, and 13.

Keep in mind that collective-media-waterfall.png represents the actual content on the main page. Notice how that image is pushed back to item 8 in the waterfall chart. In this one page load, this main content is blocked for 471 + 228 + 508 + 136 = 1343 milliseconds by the ad framework (and another 238 ms by the ad itself).

Let’s step through each request. The requests that are part of the ad framework are bolded.

  • item 1: compare.php – The HTML document.
  • item 2: a.collective-media.net/adj/cm.zimbio/picture – The main Collective Media script. This script is tiny – less than 400 bytes. It contains a document.write line that inserts the k.collective-media.net/cmadj/cm.zimbio/picture script (item 3).
  • item 3: k.collective-media.net/cmadj/cm.zimbio/picture – This was inserted as a script (by item 2). Instead of returning JavaScript code, it redirects to ak1.abmr.net/is/k.collective-media.net (item 4).
  • item 4: ak1.abmr.net/is/k.collective-media.net – This is a redirect from item 3 that itself redirects to k.collective-media.net/cmadj/cm.zimbio/picture (item 5).
  • item 5: k.collective-media.net/cmadj/cm.zimbio/picture – Most of the work of the Collective Media ad framework is done in this script. It dynamically inserts other scripts that contain the actual ad.
  • item 6: ad.doubleclick.net/adj/cm.zimbio/picture – A script that uses document.write to insert the actual ad.
  • item 7: adc_predlend_fear_300x250.jpg – The ad image.
  • item 8: collective-media-waterfall.png – The waterfall image representing the main page’s content.
  • item 9: favicon.ico – My site’s favicon.
  • item 10: cm.g.doubleclick.net/pixel – DoubleClick beacon.
  • item 11: l.collective-media.net/log – Collective Media beacon that fails.
  • item 12: l.collective-media.net/log – Retry of the Collective Media beacon.
  • item 13: a.collective-media.net/idpair – Another Collective Media beacon.

Items 2-5 are part of the ad framework. They have a dramatic impact on performance because of the way they’re daisy chained together:

Item 2 is a script that document.writes a request for item 3.
⇒ Item 3 redirects to item 4
⇒ Item 4 redirects to item 5 cach
⇒ Item 5 document.writes a request for item 6.

All of these requests are performed sequentially. This is the main reason why the main content in the page (collective-media-waterfall.png) is delayed 1343 milliseconds.

Here are some of the performance issues with this snippet.

1. The redirects cause sequential downloads.

A redirect is almost as bad as a script when it comes to blocking. The redirect from b.scorecardresearch.com/b to /b2 causes those two resources to happen sequentially. It would be better to avoid the redirect if possible.

2. The scripts block the main content of the page from loading.

It would be better to load the script without blocking, similar to what BuySellAds.com does. In this case there are two blocking scripts that are part of the ad framework (and more that are part of the actual ad).

3. The ad is inserted using document.write.

Scripts that use document.write slow down the page because they can’t be loaded asynchronously. Inserting ads into a page without using document.write can be tricky. BuySellAds.com solves this problem by creating a DIV with the desired width and height to hold the ad, and then setting the DIV’s innerHTML.

4. The beacon returns a 200 HTTP status code.

I recommend returning a 204 (No Content) status code for beacons. A 204 response has no body and browsers will never cache them, which is exactly what we want from a beacon. In this case, the image body is less than 100 bytes. Although the savings are minimal, using a 204 response for beacons is a good best practice.


There’s one other part of the Collective Media ad framework I’d like to delve into: how scripts are loaded.

The code snippet given to publishers loads the initial script using document.write. It appears this is done to inject a random number into the URL, as opposed to using Cache-Control headers:

1: <script type=”text/javascript” >
2: document.write(unescape(“%3Cscript src=’http://a.collective-media.net/adj/cm.zimbio/picture;sz=300×250;ord=” +  Math.round(Math.random()*10000000) + “‘ type=’text/javascript’%3E%3C/script%3E”));
3: </script>

That initial script (item 2 in the waterfall chart) returns just one line of JavaScript that does another document.write to insert a script (item 3), again inserting a random number into the URL:

1: document.write(‘<scr’+’ipt language=”javascript” src=”http://k.collective-media.net/cmadj/cm.zimbio/picture;sz=300×250;ord=2381217;ord1=’ +Math.floor(Math.random() * 1000000) + ‘;cmpgurl=’+escape(escape(cmPageURL))+’?”>’);  document.write(‘</scr’+’ipt>’);

That script request (item 3) is a redirect which leads to another redirect (item 4), which returns a heftier script (item 5) that starts to insert the actual ad. At the end of this script is a call to CollectiveMedia.createAndAttachAd. Here’s that function (unminified):

1: createAndAttachAd:function(h,c,a,d,e){
2:     var f=document.getElementsByTagName(“script”);
3:     var b=f[f.length-1];
4:     if(b==null){ return; }
5:     var i=document.createElement(“script”);
6:     i.language=”javascript”;
7:     i.setAttribute(“type”,”text/javascript”);
8:     var j=””;
9:     j+=”document.write(‘<scr’+’ipt language=\”javascript\” src=\””+c+”\”></scr’+’ipt>’);”;
10:     var g=document.createTextNode(j);
11:     b.parentNode.insertBefore(i,b);
12:     appendChild(i,j);
13:     if(e){
14:         var k=new cmIV_();
15:         k._init(h,i.parentNode,a,d);
16:     }
17: },

In lines 2-7 & 11 a script element is created and inserted into the document. It’s debatable if you need to set the language (line 6) and type (line 7), but that’s minor. Using insertBefore instead of appendChild is a new pattern I’ve just started seeing that is more robust, so it’s nice to see that here. Lines 7-8 create a string of JavaScript to insert an external script using document.write. This could be one line, but again, that’s minor.

Then things get a little strange. Line 10 creates a text node element (“g”) that’s never used. In line 11 the script element is inserted into the document. Then a home built version of appendChild is called. This function is added to global namespace (ouch). Here’s what that function looks like:

1: function appendChild(a,b){
2:     if(null==a.canHaveChildren||a.canHaveChildren){
3:         a.appendChild(document.createTextNode(b));
4:     }
5:     else{
6:         a.text=b;
7:     }
8: }

OK. To wrap this up: A script element is created dynamically and inserted in the document. Then a string of JavaScript is injected into this script element. That line of JavaScript document.writes an external script request into the page. If that seems convoluted to you, you’re not alone. It took me awhile to wrap my head around this.

A cleaner approach would be to set the SRC property of the dynamic script element, rather than document.writing the script into the page. This would reduce the amount of code (small win), but more importantly avoiding document.write opens the door for loading ads asynchronously. This is what’s required to reach a state where ad content and publisher content co-exist equally in web pages.

8 Comments

P3PC: Glam Media

April 13, 2010 10:45 am | 3 Comments

P3PC is a project to review the performance of 3rd party content such as ads, widgets, and analytics. You can see all the reviews and stats on the P3PC home page. This blog post looks at Glam Media. Here are the summary stats.

impacton page Page Speed YSlow doc.
write
total reqs totalxfer size JS ungzip DOM elems median Δload time
big 89 83 y 11 68 kB 63 kB 7 na**
* Stats for ads only include the ad framework and not any ad content.
** It’s not possible to gather timing stats for snippets with live ads.
column definitions

I don’t have an account with Glam Media, so my friends over at Zimbio let me use their ad codes during my testing. Since these are live (paying) ads I have to mask the ad codes in the snippet shown here. This means it’s not possible to crowdsource time measurements for these ads.

Snippet Code

Let’s look at the actual snippet code:

1: <script type=”text/javascript” language=”javascript” src=”http://www2.glam.com/app/site/affiliate/viewChannelModule.act?mName=viewAdJs&affiliateId=123456789&adSize=300×250&zone=Marketplace”>
2: </script>
snippet code as of April 12, 2010

The Glam Media ad is kicked off from a single script: viewChannelModule.act. This script is loaded using normal SCRIPT SRC tags, which causes blocking in IE7 and earlier.

Performance Analysis

This HTTP waterfall chart was generated by WebPagetest.org using IE 7 with a 1.5Mbps connection from Dulles, VA. In my analysis of ad snippets I focus only on the ad framework, not on the actual ads. The Glam Media ad framework alone constitutes 9 HTTP requests.

Let’s step through each request.

  • item 1: compare.php– The HTML document.
  • item 2: viewChannelModule.act– The main Glam Media script.
  • item 3: ad.doubleclick.net– The actual ad (not included in my analysis).
  • item 4: glamadapt_jsrv.act– Script loaded by viewChannelModule.act using document.write.
  • item 5: quant.jsQuantcastscript loaded by viewChannelModule.act using document.write.
  • item 6: beacon.jsScorecardResearchscript loaded by viewChannelModule.act using document.write.
  • item 7: glam_comscore.js– Script loaded by viewChannelModule.act using document.write.
  • item 8: pixel– Beacon sent by quant.js.
  • item 9: b.scorecardresearch.com/b– Beacon sent by glam_comscore.js. This returns a redirect to /b2 (item 11).
  • item 10: glam-media-waterfall.png– The image representing the main page’s content.
  • item 11: altfarm.mediaplex.com/ad/js/– The actual ad (not included in my analysis).
  • item 12: b.scorecardresearch.com/b2 – Another beacon sent as a result of the redirect from /b (item 9).

Keep in mind that glam-media-waterfall.pngrepresents the actual content on the main page. Notice how that image is pushed back to item 10 in the waterfall chart. In this one page load, this main content is blocked for 617 + 808 = 1425 milliseconds. Here are some of the performance issues with this snippet.

1. Too many HTTP requests.

9 HTTP requests for an ad framework (not counting the ad itself) is a lot. The fact that these come from a variety of different services exacerbates the problem because more DNS lookups are required. These 9 HTTP requests are served from 6 different domains.

2. The scripts block the main content of the page from loading.

It would be better to load the script without blocking, similar to what BuySellAds.com does.

3. The ad is inserted using document.write.

Scripts that use document.write slow down the page because they can’t be loaded asynchronously. Inserting ads into a page without using document.write can be tricky. BuySellAds.comsolves this problem by creating a DIV with the desired width and height to hold the ad, and then setting the DIV’s innerHTML.

4. The redirects cause sequential downloads.

A redirect is almost as bad as a script when it comes to blocking. The redirect from b.scorecardresearch.com/b to /b2 causes those two resources to happen sequentially. It would be better to avoid the redirect if possible.

5. Some resources aren’t cacheable.

glam_comscore.js has no caching headers, and yet its Last-Modified date is Nov 19, 2009 (almost 5 months ago). quant.js is only cacheable for 1 day.


Much of the content in this snippet is served with good performance characteristics. The scripts are compressed and minified. One of the beacons returns a 204 No Content response, which is a nice performance optimization. But the sheer number of HTTP requests, use of document.write, and scripts loaded in a blocking fashion cause the page to load more slowly.

3 Comments