Prebrowsing

November 7, 2013 2:41 pm | 20 Comments

A favorite character from the MASH TV series is Corporal Walter EugeneÂ O’Reilly, fondly referred to as “Radar” for his knack of anticipating events before they happen. Radar was a rare example of efficiency because he was able to carry out Lt. Col. Blake’s wishes before Blake had even issued the orders.

What if the browser could do the same thing? What if it anticipated the requests the user was going to need, and could complete those requests ahead of time?Â If this was possible, the performance impact would be significant.Â Even if just the few critical resources needed were already downloaded, pages would render much faster.

Browser cache isn’t enough

You might ask, “isn’t this what the cache is for?”Â Yes! In many cases when you visit a website the browser avoids making costly HTTP requests and just reads the necessary resources from disk cache.Â But there are many situations when the cache offers no help:

first visit – The cache only comes into play on subsequent visits to a site.Â The first time you visit a site it hasn’t had time to cache any resources.
cleared – The cache gets cleared more than you think.Â In addition to occasional clearing by the user, the cache can also be cleared by anti-virus software and browser bugs.Â (19% of Chrome users have their cache cleared at least once a week due to a bug.)
purged – Since the cache is shared by every website the user visits, it’s possible for one website’s resources to get purged from the cache to make room for another’s.
expired – 69% of resources don’t have any caching headers or are cacheable for less than one day.Â If the user revisits these pages and the browser determines the resource is expired, an HTTP request is needed to check for updates.Â Even if the response indicates the cached resource is still valid, these network delays still make pages load more slowly, especially on mobile.
revved – Even if the website’s resources are in the cache from a previous visit, the website might have changed and uses different resources.

Something more is needed.

Prebrowsing techniques

In their quest to make websites faster, today’s browsers offer a number of features for doing work ahead of time.Â These “prebrowsing” (short for “predictive browsing” – a word I made up and a domain I own) techniques include:

<link rel="dns-prefetch" ...>
<link rel="prefetch" ...>
<link rel="prerender" ...>
DNS pre-resolution
TCP pre-connect
prefreshing
the preloader

These features come into play at different times while navigating web pages. I break them into these three phases:

previous page – If a web developer has high confidence about which page you’ll go to next, they can use LINK REL dns-prefetch, prefetch or prerender on the previous page to finish some work needed for the next page.
transition – Once you navigate away from the previous page there’s a transition period after the previous page is unloaded but before the first byte of the next page arrives. During this time the web developer doesn’t have any control, but the browser can work in anticipation of the next page by doing DNS pre-resolution and TCP pre-connects, and perhaps even prefreshingÂ resources.
current page – As the current page is loading, browsers have a preloader that scans the HTML for downloads that can be started before they’re needed.

Let’s look at each of the prebrowsing techniques in the context of each phase.

Phase 1 – Previous page

As with any of this anticipatory work, there’s a risk that the prediction is wrong. If the anticipatory work is expensive (e.g., steals CPU from other processes, consumes battery, or wastes bandwidth) then caution is warranted. It would seem difficult to anticipate which page users will go to next, but high confidence scenarios do exist:

If the user has done a search with an obvious result, that result page is likely to be loaded next.
If the user navigated to a login page, the logged-in page is probably coming next.
If the user is reading a multi-page article or paginated set of results, the page after the current page is likely to be next.

Let’s take the example ofÂ searching for Adventure TimeÂ to illustrate how different prebrowsing techniques can be used.

DNS-PREFETCH

If the userÂ searched for Adventure TimeÂ then it’s likely the user will click on the result for Cartoon Network, in which case we can prefetch the DNS like this:

<link rel="dns-prefetch" href="//cartoonnetwork.com">

DNS lookups are very low cost – they only send a few hundred bytes over the network – so there’s not a lot of risk. But the upside can be significant. This studyÂ from 2008 showed a median DNS lookup time of ~87 ms and a 90th percentile of ~539 ms. DNS resolutions might be faster now. You can see your own DNS lookup times by going toÂ chrome://histograms/DNS (in Chrome) and searching for the DNS.PrefetchResolution histogram. Across 1325 samples my median is 50 ms with an average of 236 ms – ouch!

In addition to resolving the DNS lookup, some browsers may go one step further and establish a TCP connection. In summary, using dns-prefetch can save a lot of time, especially for redirects and on mobile.

PREFETCH

If we’re more confident that the user will navigate to the Adventure Time page and we know some of its critical resources, we can download those resources early using prefetch:

<link rel="prefetch" href="http://cartoonnetwork.com/utils.js">

This is great, but the spec is vague, so it’s not surprising that browser implementations behave differently. For example,

Firefox downloads just one prefetch item at a time, while Chrome prefetches up to ten resources in parallel.
Android browser, Firefox, and Firefox mobile start prefetch requests after window.onload, but Chrome and Opera start them immediately possibly stealing TCP connections from more important resources needed for the current page.
An unexpected behavior is that all the browsers that support prefetch cancel the request when the user transitions to the next page. This is strange because the purpose of prefetch is to get resources for the next page, but there might often not be enough time to download the entire response. Canceling the request means the browser has to start over when the user navigates to the expected page. A possible workaround is to add theÂ “Accept-Ranges: bytes” header so that browsers can resume the request from where it left off.

It’s best to prefetch the most important resources in the page: scripts, stylesheets, and fonts. Only prefetch resources that are cacheable – which means that you probably should avoid prefetching HTML responses.

PRERENDER

If we’re really confident the user is going to the Adventure Time page next, we can prerender the page like this:

<link rel="prerender" href="http://cartoonnetwork.com/">

This is like opening the URL in a hidden tab – all the resources are downloaded, the DOM is created, the page is laid out, the CSS is applied, the JavaScript is executed, etc. If the user navigates to the specified href, then the hidden page is swapped into view making it appear to load instantly. Google Search has had this feature for years under the name Instant Pages. Microsoft recently announced they’re going to similarly useÂ prerender in Bing on IE11.

Many pages use JavaScript for ads, analytics, and DHTML behavior (start a slideshow, play a video) that don’t make sense when the page is hidden. Website owners can workaround this issue by using the page visibility API to only execute that JavaScript once the page is visible.

Support for dns-prefetch, prefetch, and prerender is currently pretty spotty.Â The following table shows the results crowdsourced from myÂ prebrowsing tests. You can see the full resultsÂ here. Just as the IE team announced upcoming support for prerender, I hope other browsers will see the value of these features and add support as well.

	dns-prefetch	prefetch	prerender
Android 4		4
Chrome	22+	31+¹	22+
Chrome Mobile	29+
Firefox	22+²	23+²
Firefox Mobile	24+	24+
IE	11³	11³	11³
Opera		15+

¹ Need to use the --prerender=enabled commandline option.
² My friend at Mozilla said these features have been present since version 12.
³ This is based on a Bing blog post. It has not been tested.

Ilya Grigorik‘s High Performance Networking in Google Chrome is a fantastic source of information on these techniques, including many examples of how to see them in action in Chrome.

Phase 2 – Transition

When the user clicks a link the browser requests the next page’s HTML document. At this point the browser has to wait for the first byte to arrive before it can start processing the next page. The time-to-first-byte (TTFB) is fairly long – data from the HTTP Archive in BigQuery indicate a median TTFB of 561 ms and a 90th percentile of 1615 ms.

During this “transition” phase the browser is presumably idle – twiddling its thumbs waiting for the first byte of the next page. But that’s not so! Browser developers realized that this transition timeÂ is aÂ HUGE window of opportunityÂ for performance prebrowsing optimizations. Once the browser starts requesting a page, it doesn’t have to wait for that page to arrive to start working. Just like Radar, the browser can anticipate what will need to be done next and can start that work ahead of time.

DNS pre-resolution & TCP pre-connect

The browser doesn’t have a lot of context to go on – all it knows is the URL being requested, but that’s enough to do DNS pre-resolution and TCP pre-connect.Â Browsers can reference prior browsing history to find clues about the DNS and TCP work that’ll likely be needed. For example, suppose the user is navigating toÂ http://cartoonnetwork.com/. From previous history the browser can remember what other domains were used by resources in that page. You can see this information in Chrome at chrome://dns. My history shows the following domains were seen previously:

ads.cartoonnetwork.com
gdyn.cartoonnetwork.com
i.cdn.turner.com

During this transition (while it’s waiting for the first byte of Cartoon Network’s HTML document to arrive) the browser can resolve these DNS lookups. This is a low cost exercise that has significant payoffs as we saw in the earlier dns-prefetch discussion.

If the confidence is high enough, the browser can go a step further and establish a TCP connection (or two) for each domain. This will save time when the HTML document finally arrives and requires page resources. The Subresource PreConnects column in chrome://dnsÂ indicates when this occurs.Â For more information about dns-presolution and tcp-preconnect seeÂ DNS Prefetching.

Prefresh

Similar to the progression from LINK REL dns-prefetch to prefetch, the browser can progress from DNS lookups to actual fetching of resources that are likely to be needed by the page. The determination of which resources to fetch is based on prior browsing history, similar to what is done in DNS pre-resolution. This is implemented as an experimental feature in Chrome called “prefresh” that can be turned on using theÂ --speculative-resource-prefetching="enabled" flag. You can see the resources that are predicted to be needed for a given URL by going to chrome://predictors and clicking on the Resource Prefetch Predictor tab.

The resource history records which resources were downloaded in previous visits to the same URL, how often the resource was hit as well as missed, and a score for the likelihood that the resource will be needed again. Based on these scores the browser can start downloading critical resources while it’s waiting for the first byte of the HTML document to arrive. Prefreshed resources are thus immediately available when the HTML needs them without the delays to fetch, read, and preprocess them. The implementation of prefresh is still evolving and being tested, but it holds potential to be another prebrowsing timesaver that can be utilized during the transition phase.

Phase 3 – Current Page

Once the current page starts loading there’s not much opportunity to do prebrowsing – the user has already arrived at their destination. However, given that the average page takes 6+ seconds to load, there is a benefit in finding all the necessary resources as early as possible and downloading them in a prioritized order. This is the role of the preloader.

Most of today’s browsers utilize a preloader – also called a lookahead parser or speculative parser. The preloader is, in my opinion, the most important browser performance optimization ever made. One study found that the preloader alone improved page load times by ~20%. The invention of preloaders was in response to the old browser behavior where scripts were downloaded one-at-a-time in daisy chain fashion.

Starting with IE 8, parsing the HTML document was modified such that it forked when an external SCRIPT SRC tag was hit: the main parser is blocked waiting for the script to download and execute, but the lookahead parser continues parsing the HTML only looking for tags that might generate HTTP requests (IMG, SCRIPT, LINK, IFRAME, etc.). The lookahead parser queues these requests resulting in a high degree of parallelized downloads.Â Given that the average web page today hasÂ 17 external scripts, you can imagine what page load times would be like if they were downloaded sequentially. Being able to download scripts and other requests in parallel results in much faster pages.

The preloader has changed the logic of how and when resources are requested. These changes can be summarized by the goal of loading critical resources (scripts and stylesheets) early while loading less critical resources (images) later. This simple goal can produce some surprising results that web developers should keep in mind. For example:

JS responsive images get queued last – I’ve seen pages that had critical (bigger) images that were loaded using a JavaScript responsive images technique, while less critical (smaller) images were loaded using a normal IMG tag. Most of the time I see these images being downloaded from the same domain. The preloader looks ahead for IMG tags, sees all the less critical images, and adds those to the download queue for that domain. Later (after DOMContentLoaded) the JavaScript responsive images technique kicks in and adds the more critical images to the download queue – behind the less critical images! This is often not the expected nor desired behavior.
scripts “at the bottom” get loaded “at the top” – A rule I promoted starting in 2007 is to move scripts to the bottom of the page. In the days before preloaders this would ensure that all the requests higher in the page, including images, got downloaded first – a good thing when the scripts weren’t needed to render the page. But most preloaders give scripts a higher priority than images. This can result in a script at the bottom stealing a TCP connection from an image higher in the page causing above-the-fold rendering to take longer.

When it comes to the preloader the bottomline is that the preloader is a fantastic performance optimization for browsers, but the logic is new and still evolving so web developers should be aware of how the preloader works and watch their pages for any unexpected download behavior.

As the low hanging fruit of web performance optimization is harvested, we have to look harder to find the next big wins. Prebrowsing is an area that holds a lot of potential to deliver pages instantly. Web developers and browser developers have the tools at their disposal and some are taking advantage of them to create these instant experiences. I hope we’ll see even wider browser support for these prebrowsing features, as well as wider adoption by web developers.

[Here are the slides and video of my Prebrowsing talk from Velocity New York 2013.]

20 Comments

Frontend SPOF

June 1, 2010 7:49 pm | 9 Comments

My evangelism of high performance web sites started off in the context of quality code and development best practices. It’s easy for a style of coding to permeate throughout a company. Developers switch teams. Code is copied and pasted (especially in the world of web development). If everyone is developing in a high performance way, that’s the style that will characterize how the company codes.

This argument of promoting development best practices gained traction in the engineering quarters of the companies I talked to, but performance improvements continued to get backburnered in favor of new features and content that appealed to the business side of the organization. Improving performance wasn’t considered as important as other changes. Everyone assumed users wanted new features and that’s what got the most attention.

It became clear to me that we needed to show a business case for web performance. That’s why the theme for Velocity 2009 was “the impact of performance on the bottom line”. Since then there have been numerous studies released that have shown that improving performance does improve the bottom line. As a result, I’m seeing the business side of many web companies becoming strong advocates for Web Performance Optimization.

But there are still occasions when I have a hard time convincing a team that focusing on web performance, specifically frontend performance, is important. Shaving off hundreds (or even thousands) of milliseconds just doesn’t seem worthwhile to them. That’s when I pull out the big guns and explain that loading scripts and stylesheets in the typical way creates a frontend single point of failure that can bring down the entire site.

Examples of Frontend SPOF

The thought that simply adding a script or stylesheet to your web page could make the entire site unavailable surprises many people. Rather than focusing on CSS mistakes and JavaScript errors, the key is to think about what happens when a resource request times out. With this clue, it’s easy to create a test case:

<html>
<head>
<script src="http://www.snippet.com/main.js" type="text/javascript">
  </script>
</head>
<body>
Here's my page!
</body>
</html>

This HTML page looks pretty normal, but if snippet.com is overloaded the entire page is blank waiting for main.js to return. This is true in all browsers.

Here are some examples of frontend single points of failure and the browsers they impact. You can click on the Frontend SPOF test links to see the actual test page.

Frontend SPOF test	Chrome	Firefox	IE	Opera	Safari
External Script	blank below	blank below	blank below	blank below	blank below
Stylesheet	flash	flash	blank below	flash	blank below
inlined @font-face	delayed	flash	flash	flash	delayed
Stylesheet with @font-face	delayed	flash	totally blank*	flash	delayed
Script then @font-face	delayed	flash	totally blank*	flash	delayed

* Internet Explorer 9 does not display a blank page, but does “flash” the element.

The failure cases are highlighted in red. Here are the four possible outcomes sorted from worst to best:

totally blank – Nothing in the page is rendered – the entire page is blank.
blank below – All the DOM elements below the resource in question are not rendered.
delayed – Text that uses the @font-face style is invisible until the font file arrives.
flash – DOM elements are rendered immediately, and then redrawn if necessary after the stylesheet or font has finished downloading.

Web Performance avoids SPOF

It turns out that there are web performance best practices that, in addition to making your pages faster, also avoid most of these frontend single points of failure. Let’s look at the tests one by one.

External ScriptÂ: All browsers block rendering of elements below an external script until the script arrives and is parsed and executed. Since many sites put scripts in the HEAD, this means the entire page is typically blank. That’s why I believe the most important web performance coding pattern for today’s web sites is to load JavaScript asynchronously. Not only does this improve performance, but it avoids making external scripts a possible SPOF.Â
StylesheetÂ: Browsers are split on how they handle stylesheets. Firefox and Opera charge ahead and render the page, and then flash the user if elements have to be redrawn because their styling changed. Chrome, Internet Explorer, and Safari delay rendering the page until the stylesheets have arrived. (Generally they only delay rendering elements below the stylesheet, but in some cases IE will delay rendering everything in the page.) If rendering is blocked and the stylesheet takes a long time to download, or times out, the user is left staring at a blank page. There’s not a lot of advice on loading stylesheets without blocking page rendering, primarily because it would introduce the flash of unstyled content.
inlined @font-faceÂ: I’ve blogged before about the performance implications of using @font-face. When the @font-face style is declared in a STYLE block in the HTML document, the SPOF issues are dramatically reduced. Firefox, Internet Explorer, and Opera avoid making these custom font files a SPOF by rendering the affected text and then redrawing it after the font file arrives. Chrome and Safari don’t render the customized text at all until the font file arrives. I’ve drawn these cells in yellow since it could cause the page to be unusable for users using these browsers, but most sites only use custom fonts on a subset of the page.
Stylesheet with @font-faceÂ: Inlining your @font-face style is the key to avoiding having font files be a single point of failure. If you inline your @font-face styles and the font file takes forever to return or times out, the worst case is the affected text is invisible in Chrome and Safari. But at least the rest of the page is visible, and everything is visible in Firefox, IE, and Opera. Moving the @font-face style to a stylesheet not only slows down your site (by requiring two sequential downloads to render text), but it also creates a special case in Internet Explorer 7 & 8 where the entire page is blocked from rendering. IE 6 is only slightly better – the elements below the stylesheet are blocked from rendering (but if your stylesheet is in the HEAD this is the same outcome).
Script then @font-faceÂ: Inlining your @font-face style isn’t enough to avoid the entire page SPOF that occurs in IE. You also have to make sure the inline STYLE block isn’t preceded by a SCRIPT tag. Otherwise, your entire page is blank in IE waiting for the font file to arrive. If that file is slow to return, your users are left staring at a blank page.

SPOF is bad

Five years ago most of the attention on web performance was focused on the backend. Since then we’ve learned that 80% of the time users wait for a web page to load is the responsibility of the frontend. I feel this same bias when it comes to identifying and guarding against single points of failure that can bring down a web site – the focus is on the backend and there’s not enough focus on the frontend. For larger web sites, the days of a single server, single router, single data center, and other backend SPOFs are way behind us. And yet, most major web sites include scripts and stylesheets in the typical way that creates a frontend SPOF. Even more worrisome – many of these scripts are from third parties for social widgets, web analytics, and ads.

Look at the scripts, stylesheets, and font files in your web page from a worst case scenario perspective. Ask yourself:

Is your web site’s availability dependent on these resources?
Is it possible that if one of these resources timed out, users would be blocked from seeing your site?
Are any of these single point of failure resources from a third party?
Would you rather embed resources in a way that avoids making them a frontend SPOF?

Make sure you’re aware of your frontend SPOFs, track their availability and latency closely, and embed them in your page in a non-blocking way whenever possible.

Update Oct 12: Pat Meenan created a blackhole server that you can use to detect frontend SPOF in webpages.

9 Comments

Browser Performance Wishlist

February 15, 2010 4:25 pm | 28 Comments

What are the most important changes browsers could make to improve performance?

This document is my answer to that question. This is mainly for browser developers, although web developers will want to track the adoption of these improvements.

download scripts without blocking
SCRIPT attributes
resource packages
border-radius
cache redirects
link prefetch
Web Timing spec
remote JS debugging
Web Sockets
History
anchor ping
progressive XHR
stylesheet & inline JS
SCRIPT DEFER for inline scripts
@import improvements
@font-face improvements
stylesheets & iframes
paint events
missing schema, double downloads

Before digging into the list I wanted to mention two items that would actually be at the top of the list if it wasn’t for how new they are: SPDY and FRAG tag. Both of these require industry adoption and possible changes to specifications, so it’s too soon to put them on an implementation wishlist. I hope these ideas gain consensus soon and to facilitate that I describe them here.

SPDY: SPDY is a proposal from Google for making three major improvements to HTTP: compressed headers, multiplexed requests, and prioritized responses. Initial studies showed 25 top sites were loaded 55% faster. Server and client implementations are available, and some other organizations and individuals have completed server and client implementations. The protocol draft has been published for review.
FRAG tag: The idea behind this “document fragment” tag is that it be used to wrap 3rd party content – ads, widgets, and analytics. 3rd party content can have a severe impact on the containing page’s performance due to additional HTTP requests, scripts that block rendering and downloads, and added DOM nodes. Many of these factors can be mitigated by putting the 3rd party content inside an iframe embedded in the top level HTML document. But iframes have constraints and drawbacks – they typically introduce another HTTP request for the iframe’s HTML document, not all 3rd party code snippets will work inside an iframe without changes (e.g., references to “document” in JavaScript might need to reference the parent document), and some snippets (expando ads, suggest) can’t float over the main page’s elements. Another path to mitigate these issues is to load the JavaScript asynchronously, but many of these widgets use document.write and so must be evaluated synchronously.A compromise is to place 3rd party content in the top level HTML document wrapped in a FRAG block. This approach degrades nicely – older browsers would ignore the FRAG tag and handle these snippets the same way they do today. Newer browsers would parse the HTML in a separate document fragment. The FRAG content would not block the rendering of the top level document. Snippets containing document.write would work without blocking the top level document. This idea just started getting discussed in January 2010. Much more use case analysis and discussion is needed, culminating in a proposed specification. (Credit to Alex Russell for the idea and name.)

The List

The performance wishlist items are sorted highest priority first. The browser icons indicate which browsers need to implement that particular improvement.

download scripts without blocking

In older browsers, once a script started downloading all subsequent downloads were blocked until the script returned. It’s critical that scripts be evaluated in the order specified, but they can be downloaded in parallel. This has a significant improvement on page load times, especially for pages with multiple scripts. Newer browsers (IE8, Firefox 3.5+, Safari 4, Chrome 2+) incorporated this parallel script loading feature, but it doesn’t work as proactively as it could. Specifically:

IE8 – downloading scripts blocks image and iframe downloads
Firefox 3.6 – downloading scripts blocks iframe downloads
Safari 4 – downloading scripts blocks iframe downloads
Chrome 4 – downloading scripts blocks iframe downloads
Opera 10.10 – downloading scripts blocks all downloads

(test case, see the four “|| Script [Script|Stylesheet|Image|Iframe]” tests)

SCRIPT attributes

The HTML5 specification describes the ASYNC and DEFER attributesfor the SCRIPT tag, but the implementation behavior is not specified. Here’s how the SCRIPT attributes should work.

DEFER – The HTTP request for a SCRIPT with the DEFER attribute is not made until all other resources in the page on the same domain have already been sent. This is so that it doesn’t occupy one of the limited number of connections that are opened for a single server. Deferred scripts are downloaded in parallel, but are executed in the order they occur in the HTML document, regardless of what order the responses arrive in. The window’s onload event fires after all deferred scripts are downloaded and executed.
ASYNC – The HTTP request for a SCRIPT with the ASYNC attribute is made immediately. Async scripts are executed as soon as the response is received, regardless of the order they occur in the HTML document. The window’s onload event fires after all async scripts are downloaded and executed.
POSTONLOAD – This is a new attribute I’m proposing. Postonload scripts don’t start downloading until after the window’s onload event has fired. By default, postonload scripts are evaluated in the order they occur in the HTML document. POSTONLOAD and ASYNC can be used in combination to cause postonload scripts to be evaluated as soon as the response is received, regardless of the order they occur in the HTML document.

resource packages

Each HTTP request has some overhead cost. Workarounds include concatenating scripts, concatenating stylesheets, and creating image sprites. But this still results in multiple HTTP requests. And sprites are especially difficult to create and maintain. Alexander Limi (Mozilla) has proposed using zip files to create resource packages. It’s a good idea because of its simplicity and graceful degradation.

border-radius

Creating rounded corners leads to code bloat and excessive HTTP requests. Border-radius reduces this to a simple CSS style. The only major browser that doesn’t support border-radius is IE. It has already been announced that IE9 will support border-radius, but I wanted to include it nevertheless.

cache redirects

Redirects are costly from a performance perspective, especially for users with high latency. Although the HTTP specsays 301 and 302 responses (with the proper HTTP headers) are cacheable, most browsers don’t support this.

IE8 – doesn’t cache redirects for the main page and for resources
Safari 4 – doesn’t cache redirects for the main page
Opera 10.10 – doesn’t cache redirects for the main page

(test case)

link prefetch

To improve page load times, developers prefetch resources that are likely or certain to be used later in the user’s session. This typically involves writing JavaScript code that executes after the onload event. When prefetching scripts and stylesheets, an iframe must be used to avoid conflict with the JavaScript and CSS in the main page. Using an iframe makes this prefetching code more complex. A final burden is the processing required to parse prefetched scripts and stylesheets. The browser UI can freeze while prefetched scripts and stylesheets are parsed, even though this is unnecessary as they’re not going to be used in the current page. A simple alternative solution is to use LINK PREFETCH. Firefox is the only major browser that supports this feature (since 1.0). Wider support of LINK PREFETCH would give developers an easy way to accelerate their web pages. (test case)

Web Timing spec

In order for web developers to improve the performance of their web sites, they need to be able to measure their performance – specifically their page load times. There’s debate on the endpoint for measuring page load times (window onload event, first paint event, onDomReady), but most people agree that the starting point is when the web page is requested by the user. And yet, there is no reliable way for the owner of the web page to measure from this starting point. Google has submitted the Web Timing proposal draft for browser builtin support for measuring page load times to address these issues.

remote JS debugging

Developers strive to make their web apps fast across all major browsers, but this requires installing and learning a different toolset for each browser. In order to get cross-browser web development tools, browsers need to support remote JavaScript debugging. There’s been progress in building protocols to support remote debugging: WebDebugProtocol and Crossfire in Firefox, Scope in Opera, and ChromeDevTools in Chrome. Agreement on the preferred protocol and support in the major browsers would go a long way to getting faster web apps for all users, and reducing the work for developers to maintain cross-browser web app performance.

Web Sockets

HTML5 Web Sockets provide built-in support for two-way communications between the client and server. The communication channel is accessible via JavaScript. Web Sockets are superior to comet and Ajax, especially in their compatibility with proxies and firewalls, and provide a path for building web apps with a high degree of communication between the browser and server.

History

HTML5 specifies implementation for History.pushState and History.replaceState. With these, web developers can dynamically change the URL to reflect the web application state without having to perform a page transition. This is important for Web 2.0 applications that modify the state of the web page using Ajax. Being able to avoid fetching a new HTML document to reflect these application changes results in a faster user experience.

anchor ping

The ping attribute for anchors provides a more performant way to track links. This is a controversial feature because of the association with “tracking” users. However, links are tracked today, it’s just done in a way that hurts the user experience. For example, redirects, synchronous XHR, and tight loops in unload handlers are some of the techniques used to ensure clicks are properly recorded. All of these create a slower user experience.

progressive XHR

The draft spec for XMLHttpRequest details how XHRs are to support progressive response handling. This is important for web apps that use data with varied response times as well as comet-style applications. (more information)

stylesheet & inline JS

When a stylesheet is followed by an inline script, resources that follow are blocked until the stylesheet is downloaded and the inline script is evaluated. Browsers should instead lookahead in their parsing and start downloading subsequent resources in parallel with the stylesheet. These resources of course would not be rendered, parsed, or evaluated until after the stylesheet was parsed and the inline script was evaluated. (test case see “|| CSS + Inline Script”; looks like this just landed in Firefox 3.6!)

SCRIPT DEFER for inline scripts

The benefit of the SCRIPT DEFER attribute for external scripts is discussed above. But DEFER is also useful for inline scripts that can be executed after the page has been parsed. Currently, IE8 supports this behavior. (test case)

@import improvements

@import is a popular alternative to the LINK tag for loading stylesheets, but it has several performance problems in IE:

LINK @import – If the first stylesheet is loaded using LINK and the second one uses @import, they are loaded sequentially instead of in parallel. (test case)
LINK blocks @import – If the first stylesheet is loaded using LINK, and the second stylesheet is loaded using LINK that contains @import, that @import stylesheet is blocked from downloading until the first stylesheet response is received. It would be better to start downloading the @import stylesheet immediately. (test case)
many @imports – Using @import can change the download sequence of resources. In this test case, multiple stylesheets loaded with @import are followed by a script. Even though the script is listed last in the HTML document, it gets downloaded first. If the script takes a long time to download, it can causes the stylesheet downloads to be delayed, which can cause rendering to be delayed. It would be better to follow the order specified in the HTML document. (test case)

(more information)

@font-face improvements

In IE8, if a script occurs before a style that uses @font-face, the page is blocked from rendering until the font file is done downloading. It would be better to render the rest of the page without waiting for the font file. (test case, blog post)

stylesheets & iframes

When an iframe is preceded by an external stylesheet, it blocks iframe downloads. In IE, the iframe is blocked from downloading until the stylesheet response is received. In Firefox, the iframe’s resources are blocked from downloading until the stylesheet response is received. There’s no dependency between the parent’s stylesheet and the iframe’s HTML document, so this blocking behavior should be removed. (test case)

paint events

As the amount of DOM elements and CSS grows, it’s becoming more important to be able to measure the performance of painting the page. Firefox 3.5 added the MozAfterPaint event which opened the door for add-ons like Firebug Paint Events (although early Firefox documentation noted that the “event might fire before the actual repainting happens“). Support for accurate paint events will allow developers to capture these metrics.

missing schema, double downloads

In IE7&8, if the “http:” schema is missing from a stylesheet’s URL, the stylesheet is downloaded twice. This makes the page render more slowly. Not including “http://” in URLs is not pervasive, but it’s getting more widely adopted because it reduces download size and resolves to “http://” or “https://” as appropriate. (test case)

28 Comments

5e speculative background images

February 12, 2010 6:09 pm | 13 Comments

This is the fifth of five quick posts about some browser quirks that have come up in the last few weeks.

Chrome and Safari start downloading background images before all styles are available. If a background image style gets overwritten this may cause wasteful downloads.

Background images are used everywhere: buttons, background wallpaper, rounded corners, etc. You specify a background image in CSS like so:

.bgimage { background-image: url("/images/button1.gif"); }

Downloading resources is an area for optimizing performance, so it’s important to understand what causes CSS background images to get downloaded. See if you can answer the following questions about button1.gif:

Suppose no elements in the page use the class “bgimage”. Is button1.gif downloaded?
Suppose an element in the page has the class “bgimage” but also has “display: none” or “visibility: hidden”. Is button1.gif downloaded?
Suppose later in the page a stylesheet gets downloaded and redefines the “bgimage” class like this:
```
.bgimage { background-image: url("/images/button2.gif"); }
```
Is button1.gif downloaded?

Ready?

The answer to question #1 is “no”. If no elements in the page use the rule, then the background image is not downloaded. This is true in all browsers that I’ve tested.

The answer to question #2 is “depends on the browser”. This might be surprising. Firefox 3.6 and Opera 10.10 do not download button1.gif, but the background image is downloaded in IE 8, Safari 4, and Chrome 4. I don’t have an explanation for this, but I do have a test page: hidden background images. If you have elements with background images that are hidden initially, you should hold off on creating them until after the visible content in the page is rendered.

The answer to question #3 is “depends on the browser”. I find this to be the most interesting behavior to investigate. According to the cascading behavior of CSS, the latter definition of the “bgimage” class should cause the background-image style to use button2.gif. And in all the major browsers this is exactly what happens. But Safari 4 and Chrome 4 are a little more aggressive about fetching background images. They download button1.gif on the speculation that the background-image property won’t be overwritten, and then later download button2.gif when it is overwritten. Here’s the test page: speculative background images.

When my officemate, Steve Lamm, pointed out this behavior to me, my first reaction was “that’s wasteful!” I love prefetching, but I’m not a big fan of most prefetching implementations because they’re too aggressive – they err too far on the side of downloading resources that never get used. After my initial reaction, I thought about this some more. How frequently would this speculative background image downloading be wasteful? I went on a search and couldn’t find any popular web site that overwrote the background-image style. Not one. I’m not saying pages like this don’t exist, I’m just saying it’s very atypical.

On the other hand, this speculative downloading of background images can really help performance and the user’s perception of page speed. Many web sites have multiple stylesheets. If background images don’t start downloading until all stylesheets are done loading, the page takes longer to render. Safari and Chrome’s behavior of downloading a background image as soon as an element needs it, even if one or more stylesheets are still downloading, is a nice performance optimization.

That’s a nice way to finish the week. Next week: my Browser Performance Wishlist.

The five posts in this series are:

13 Comments

5d dynamic stylesheets

February 12, 2010 1:13 am | 7 Comments

This is the fourth of five quick posts about some browser quirks that have come up in the last few weeks.

You can avoid blocking rendering in IE if you load stylesheets using DHTML and setTimeout.

A few weeks ago I had a meeting with a company that makes a popular widget. One technique they used to reduce their widget’s impact on the main page was to load a stylesheet dynamically, something like this:

var link = document.createElement('link');
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = '/main.css';
document.getElementsByTagName('head')[0].appendChild(link);

Most of my attention for the past year has been on loading scripts dynamically to avoid blocking downloads. I haven’t focused on loading stylesheets dynamically. When it comes to stylesheets, blocking downloads isn’t an issue – stylesheets don’t block downloads (except in Firefox 2.0). The thing to worry about when downloading stylesheets is that IE blocks rendering until all stylesheets are downloaded¹, and other browsers might experience a Flash Of Unstyled Content (FOUC).

FOUC isn’t a concern for this widget – the rules in the dynamically-loaded stylesheet only apply to the widget, and the widget hasn’t been created yet so nothing can flash. If the point of loading the stylesheet dynamically is to not mess with the containing page, we have to make sure dynamic stylesheets don’t block the page from rendering in IE.

I created the DHTML stylesheet example to show what happens. The page loads a stylesheet dynamically. The stylesheet is configured to take 4 seconds to download. If you load the page in Internet Explorer the page is blank for 4 seconds. In order to decouple the stylesheet load from page rendering, the DHTML code has to be invoked using setTimeout. That’s what I do in the DHTML + setTimeout stylesheet test page. This works. The page renders immediately while the stylesheet is downloaded in the background.

This technique is applicable when you have stylesheets that you want to load in the page but the stylesheet’s rules don’t apply to any DOM elements in the page currently. This is a pretty small use case. It makes sense for widgets or pages that have DHTML features that aren’t invoked until after the page has loaded. If you find yourself in that situation, you can use this technique to avoid the blank white screen in IE.

The five posts in this series are:

¹	Simple test pages may not reproduce this problem. My testing shows that you need a script (inline or external) above the stylesheet, or two or more stylesheets for rendering to be blocked. If your page has only one stylesheet and no SCRIPT tags, you might not experience this issue.

7 Comments

5c media=print stylesheets

February 11, 2010 5:27 pm | 2 Comments

This is the third of five quick posts about some browser quirks that have come up in the last few weeks.

Stylesheets set with media=”print” still block rendering in Internet Explorer.

A few weeks ago a friend at a top web company pinged me about a possible bug in Page Speed and YSlow. Both tools were complaining about stylesheets he placed at the bottom of his page, an obvious violation of my put stylesheets at the top rule from High Performance Web Sites. The reasoning behind this rule is that Internet Explorer won’t start rendering the page until all stylesheets are downloaded¹, and other browsers might produce the Flash Of Unstyled Content (FOUC). It’s best to put stylesheets at the top so they get downloaded as soon as possible.

His reason for putting these stylesheets at the bottom was that they were specified with media="print". Since these stylesheets weren’t going to be used to render the current page, he wanted to load them last so that other more important resources could get downloaded sooner. Going back to the reasons for the “put stylesheets at the top” rule, he wouldn’t have to worry about FOUC (the stylesheets wouldn’t be applied to the current page). But would he have to worry about IE blocking the page from rendering? Time for a test page.

The media=print stylesheets test page contains one stylesheet at the bottom with media="print". This stylesheet is configured to take 4 seconds to download. If you view this page in Internet Explorer you’ll see that rendering is indeed blocked for 4 seconds (tested on IE 6, 7, & 8).

I’m surprised browsers haven’t gotten to the point where they skip downloading stylesheets for a different media type than the current one. I’ve asked some web devs but no one can think of a good reason for doing this. In the meantime, even if you have stylesheets with media="print" you might want to follow the advice of Page Speed and YSlow and put them in the document HEAD. Or you could try loading them dynamically. That’s the topic I’ll cover in my next blog post.

The five posts in this series are:

¹	Simple test pages may not reproduce this problem. My testing shows that you need a script (inline or external) above the stylesheet, or two or more stylesheets for rendering to be blocked. If your page has only one stylesheet and no SCRIPT tags, you might not experience this issue.

2 Comments

5a Missing schema double download

February 10, 2010 5:12 pm | 15 Comments

This is the first of five quick posts about some browser quirks that have come up in the last few weeks.

Internet Explorer 7 & 8 will download stylesheets twice if the http(s) protocol is missing.

If you have an HTTPS page that loads resources with “http://” in the URL, IE halts the download and displays an error dialog. This is called mixed content and should be avoided. How should developers code their URLs to avoid this problem? You could do it on the backend in your HTML template language. But a practice that is getting wider adoption is protocol relative URLs.

A protocol relative URL doesn’t contain a protocol. For example,

https://stevesouders.com/images/book-84x110.jpg

becomes

//stevesouders.com/images/book-84x110.jpg

Browsers substitute the protocol of the page itself for the resource’s missing protocol. Problem solved! In fact, today’s HttpWatch Blog posted about this: Using Protocol Relative URLs to Switch between HTTP and HTTPS.

However, if you try this in Internet Explorer 7 and 8 you’ll see that stylesheets specified with a protocol relative URL are downloaded twice. Hard to believe, but true. My officemate, Steve Lamm, discovered this when looking at the new Nexus One Phone page. That page fetches a stylesheet like this:

<link type="text/css" rel="stylesheet" href="//www.google.com/phone/static/2496921881-SiteCss.css">

Notice there’s no protocol. If you load this page in Internet Explorer 7 and 8 the waterfall chart (nicely generated by HttpWatch) looks like this:

Notice 2496921881-SiteCss.css is downloaded twice, and each time it’s a 200 response, so it’s not being read from cache.

It turns out this only happens with stylesheets. The Missing schema, double download test page I created contains a stylesheet, an image, and a script that all have protocol relative URLs pointing to 1.cuzillion.com. The stylesheet is downloaded twice, but the image and script are only downloaded once. I added another stylesheet from 2.cuzillion.com that has a full URL (i.e., it starts with “http:”). This stylesheet is only downloaded once.

Developers should avoid using protocol relative URLs for stylesheets if they want their pages to be as fast as possible in Internet Explorer 7 & 8.

The five posts in this series are:

15 Comments

Browser script loading roundup

February 7, 2010 12:12 am | 8 Comments

How are browsers doing when it comes to parallel script loading?

Back in the days of IE7 and Firefox 2.0, no browser loaded scripts in parallel with other resources. Instead, these older browsers would block all subsequent resource requests until the script was received, parsed, and executed. Here’s how the HTTP requests look when this blocking occurs in older browsers:

Scripts block parallel downloads in older browsers.

The test page that generated this waterfall chart has six HTTP requests:

the HTML document
the 1st script – 2 seconds to download, 2 seconds to execute
the 2nd script – 2 seconds to download, 2 seconds to execute
an image – 1 second to download
a stylesheet- 1 second to download
an iframe – 1 second to download

The figure above shows how the scripts block each other and block the image, stylesheet, and iframe, as well. The image, stylesheet, and iframe download in parallel with each other, but not until the scripts are finished downloading sequentially.

The likely reason scripts were downloaded sequentially in older browsers was to preserve execution order. This is critical when code in the 2nd script depends on symbols defined in the 1st script. Preserving execution order avoids undefined symbol errors. But the missed opportunity is obvious – while the browser is downloading the first script and guaranteeing to execute it first, it could be downloading the other four resources in parallel.

Thankfully, newer browsers now load scripts in parallel!

This is a big win for today’s web apps that often contain 100K+ of JavaScript split across multiple files. Loading the same test page in IE8, Firefox 3.6, Chrome 4, and Safari 4 produces an HTTP waterfall chart like this:

Scripts load in parallel in newer browsers

Things look a lot better, but not as good as they should be. In this case, IE8 loads the two scripts and stylesheet in parallel, but the image and iframe are blocked. All of the newer browsers have similar limitations with regard to the extent to which they load scripts in parallel with other types of resources. This table from Browserscope shows where we are and the progress made to get to this point. The recently added “Compare” button added to Browserscope made it easy to generate this historical view.

While downloading scripts, IE8 still blocks on images and iframes. Chrome 4, Firefox 3.6, and Safari 4 block on iframes. Opera 10.10 blocks on all resource types. I’m confident parallel script loading will continue to improve based on the great progress made in the last batch of browsers. Let’s keep our eyes on the next browsers to see if things improve even more.

8 Comments

Loading Scripts Without Blocking

April 27, 2009 10:49 pm | 47 Comments

This post is based on a chapter from Even Faster Web Sites, the follow-up to High Performance Web Sites. Posts in this series include: chapters and contributing authors, Splitting the Initial Payload, Loading Scripts Without Blocking, Coupling Asynchronous Scripts, Positioning Inline Scripts, Sharding Dominant Domains, Flushing the Document Early, Using Iframes Sparingly, and Simplifying CSS Selectors.

As more and more sites evolve into “Web 2.0” apps, the amount of JavaScript increases. This is a performance concern because scripts have a negative impact on page performance. Mainstream browsers (i.e., IE 6 and 7)Â block in two ways:

Resources in the page are blocked from downloading if they are below the script.
Elements are blocked from rendering if they are below the script.

The Scripts Block Downloads example demonstrates this. It contains two external scripts followed by an image, a stylesheet, and an iframe. The HTTP waterfall chart from loading this example in IE7 shows that the first script blocks all downloads, then the second script blocks all downloads, and finally the image, stylesheet, and iframe all download in parallel. Watching the page render, you’ll notice that the paragraph of text above the script renders immediately. However, the rest of the text in the HTML document is blocked from rendering until all the scripts are done loading.

Scripts block downloads in IE6&7, Firefox 2&3.0, Safari 3, ChromeÂ 1, and Opera

Browsers are single threaded, so it’s understandable that while a script is executing the browser is unable to start other downloads. But there’s no reason that while the script is downloading the browser can’t start downloading other resources. And that’s exactly what newer browsers, including Internet Explorer 8, Safari 4, and Chrome 2, have done. The HTTP waterfall chart for the Scripts Block Downloads example in IE8 shows the scripts do indeed download in parallel, and the stylesheet is included in that parallel download. But the image and iframe are still blocked. Safari 4 and Chrome 2 behave in a similar way. Parallel downloading improves, but is still not as much as it could be.

Scripts still block, even in IE8, Safari 4, and ChromeÂ 2

Fortunately, there are ways to get scripts to download without blocking any other resources in the page, even in older browsers. Unfortunately, it’s up to the web developer to do the heavy lifting.

There are six main techniques for downloading scripts without blocking:

XHR Eval – Download the script via XHR and eval() the responseText.
XHR Injection – Download the script via XHR and inject it into the page by creating a script element and setting its text property to the responseText.
Script in Iframe – Wrap your script in an HTML page and download it as an iframe.
Script DOM Element – Create a script element and set its src property to the script’s URL.
Script Defer – Add the script tag’s defer attribute. This used to only work in IE, but is now in Firefox 3.1.
document.write Script Tag – Write the <script src=""> HTML into the page using document.write. This only loads script without blocking in IE.

You can see an example of each technique using Cuzillion. It turns out that these techniques have several important differences, as shown in the following table. Most of them provide parallel downloads, although Script Defer and document.write Script Tag are mixed. Some of the techniques can’t be used on cross-site scripts, and some require slight modifications to your existing scripts to get them to work. An area of differentiation that’s not widely discussed is whether the technique triggers the browser’s busy indicators (status bar, progress bar, tab icon, and cursor). If you’re loading multiple scripts that depend on each other, you’ll need a technique that preserves execution order.

Technique	Parallel Downloads	Domains can Differ	Existing Scripts	Busy Indicators	Ensures Order	Size (bytes)
XHR Eval	IE, FF, Saf, Chr, Op	no	no	Saf, Chr	–	~500
XHR Injection	IE, FF, Saf, Chr, Op	no	yes	Saf, Chr	–	~500
Script in Iframe	IE, FF, Saf, Chr, Op	no	no	IE, FF, Saf, Chr	–	~50
Script DOM Element	IE, FF, Saf, Chr, Op	yes	yes	FF, Saf, Chr	FF, Op	~200
Script Defer	IE, Saf4, Chr2, FF3.1	yes	yes	IE, FF, Saf, Chr, Op	IE, FF, Saf, Chr, Op	~50
document.write Script Tag	IE, Saf4, Chr2, Op	yes	yes	IE, FF, Saf, Chr, Op	IE, FF, Saf, Chr, Op	~100

The question is: Which is the best technique? The optimal technique depends on your situation. This decision tree should be used as a guide. It’s not as complex as it looks. Only three variables determine the outcome: is the script on the same domain as the main page, is it necessary to preserve execution order, and should the busy indicators be triggered.

Ideally, the logic in this decision tree would be encapsulated in popular HTML templating languages (PHP, Python, Perl, etc.) so that the web developer could just call a function and be assured that their script gets loaded using the optimal technique.

In many situations, the Script DOM Element is a good choice. It works in all browsers, doesn’t have any cross-site scripting restrictions, is fairly simple to implement, and is well understood. The one catch is that it doesn’t preserve execution order across all browsers. If you have multiple scripts that depend on each other, you’ll need to concatenate them or use a different technique. If you have an inline script that depends on the external script, you’ll need to synchronize them. I call this “coupling” and present several ways to do this in Coupling Asynchronous Scripts.

47 Comments

Performance Impact of CSS Selectors

March 10, 2009 11:28 pm | 53 Comments

A few months back there were some posts about the performance impact of inefficient CSS selectors. I was intrigued – this is the kind of browser idiosyncratic behavior that I live for. On further investigation, I’m not so sure that it’s worth the time to make CSS selectors more efficient. I’ll go even farther and say I don’t think anyone would notice if we woke up tomorrow and every web page’s CSS selectors were magically optimized.

The first post that caught my eye was about CSS Qualified Selectors by Shaun Inman. This post wasn’t actually about CSS performance, but in one of the comments David Hyatt (architect for Safari and WebKit, also worked on Mozilla, Camino, and Firefox) dropped this bomb:

The sad truth about CSS3 selectors is that they really shouldnâ€™t be used at all if you care about page performance. Decorating your markup with classes and ids and matching purely on those while avoiding all uses of sibling, descendant and child selectors will actually make a page perform significantly better in all browsers.

Wow. Let me say that again. Wow.

The next posts were amazing. It was a series on Testing CSS Performance from Jon Sykes in three parts: part 1, part 2, and part 3. It’s fun to see how his tests evolve, so part 3 is really the one to read. This had me convinced that optimizing CSS selectors was a key step to fast pages.

But there were two things about the tests that troubled me. First, the large number of DOM elements and rules worried me. The pages contain 60,000 DOM elements and 20,000 CSS rules. This is an order of magnitude more than most pages. Pages this large make browsers behave in unusual ways (we’ll get back to that later). The table below has some stats from the top ten U.S. web sites for comparison.

Web Site	# CSS Rules	#DOM Elements
AOL	2289	1628
eBay	305	588
Facebook	2882	1966
Google	92	552
Live Search	376	449
MSN	1038	886
MySpace	932	444
Wikipedia	795	1333
Yahoo!	800	564
YouTube	821	817
average	1033	923

The second thing that concerned me was how small the baseline test page was, compared to the more complex pages. The main question I want to answer is “do inefficient CSS selectors slow down pages?” All five test pages contain 20,000 anchor elements (nested inside P, DIV, DIV, DIV). What changes is their CSS: baseline (no CSS), tag selector (one rule for the A tag), 20,000 class selectors, 20,000 child selectors, and finally 20,000 descendant selectors. The last three pages top out at over 3 megabytes in size. But the baseline page and tag selector page, with little or no CSS, are only 1.8 megabytes. These pages answer the question “how much faster would my page be if I eliminated all CSS?” But not many of us are going to eliminate all CSS from our pages.

I revised the test as follows:

2000 anchors and 2000 rules (instead of 20,000) – this actually results in ~6000 DOM elements because of all the nesting in P, DIV, DIV, DIV
the baseline page and tag selector page have 2000 rules just like all the other pages, but these are simple class rules that don’t match any classes in the page

I ran these tests on 12 browsers. Page render time was measured with a script block at the top and bottom of the page. (I loaded the page from local disk to avoid possible impact from chunked encoding.) The results are shown in the chart below. (I don’t show Opera 9.63 – it was way too slow – but you can download all the data as csv. You can also see the test pages.)

Performance varies across browsers; strangely, two new browsers, IE 8 and Firefox 3.1, are the slowest but comparisons should not be made from one browser to another. Although all the tests for a given browser were conducted on a single PC, different browsers might have been tested on different PCs with different performance characteristics. The goal of this experiment is not to compare browser performance – it’s to see how browsers handle progressively more complex CSS selectors.

[Revision: On further inspection comparing Firefox 3.0 and 3.1, I discovered that the test PC I used for testing Firefox 3.1 and IE 8 was slower than the other test PCs used in this experiment. I subsequently re-ran those tests as well as Firefox 3.0 and IE 7 on PCs that were more consistent and updated the chart above. Even with this re-run, because of possible differences in test hardware, do not use this data to compare one browser to another.]

Not surprisingly, the more complex pages (child selectors and descendant selectors) usually perform the worst. The biggest surprise is how small the delta is from the baseline to the most complex, worst performing test page. The average slowdown across all browsers is 50 ms, and if we look at the big ones (IE 6&7, FF3), the average delta is just 20 ms. For 70% or more of today’s users, improving these CSS selectors would only make a 20 ms improvement.

Keep in mind – these test pages are close to worst case. The 2000 anchors wrapped in P, DIV, DIV, DIV result in 6000 DOM elements – that’s twice as big as the max in the top ten sites. And the complex pages have 2000 extremely inefficient rules – a typical site has around one third of their rules that are complex child or descendant selectors. Facebook, for example, with the maximum number of rules at 2882 only has 750 that are these extremely inefficient rules.

Why do the results from my tests suggest something different from what’s been said lately? One difference comes from looking at things at such a large scale. It’s okay to exaggerate test cases if the results are proportional to common use cases. But in this case, browsers behave differently when confronted with a 3 megabyte page with 60,000 elements and 20,000 rules. I especially noticed that my results were much different for IE 6&7. I wondered if there was a hockey stick in how IE handled CSS selectors. To investigate this I loaded the child selector and descendant selector pages with increasing number of anchors and rules, from 1000 to 20,000. The results, shown in the chart below, reveal that IE hits a cliff around 18,000 rules. But when IE 6&7 work on a page that is closer to reality, as in my tests, they’re actually the fastest performers.

Based on these tests I have the following hypothesis: For most web sites, the possible performance gains from optimizing CSS selectors will be small, and are not worth the costs. There are some types of CSS rules and interactions with JavaScript that can make a page noticeably slower. This is where the focus should be. So I’m starting to collect real world examples of small CSS style-related issues (offsetWidth, :hover) that put the hurt on performance. If you have some, send them my way. I’m speaking at SXSW this weekend. If you’re there, and want to discuss CSS selectors, please find me. It’s important that we’re all focusing on the performance improvements that our users will really notice.

53 Comments