SERIOUS CONFUSION with Resource Timing

November 25, 2014 1:20 pm | 2 Comments

or “Duration includes Blocking”

Resource Timing is a great way to measure how quickly resources download. Unfortunately, almost everyone I’ve spoken with does this using the “duration” attribute and are not aware that “duration” includes blocking time. As a result, “duration” time values are (much) greater than the actual download time, giving developers unexpected results. This issue is especially bad for cross-origin resources where “duration” is the only metric available. In this post I describe the problem and a proposed solution.

Resource Timing review

The Resource Timing specification defines APIs for gathering timing metrics for each resource in a web page. It’s currently available in Chrome, Chrome for Android, IE 10-11, and Opera. You can gather a list of PerformanceEntry objects using getEntries(), getEntriesByType(), and getEntriesByName(). A PerformanceEntry has these properties:

  • name – the URL
  • entryType – typically “resource”
  • startTime – time that the resource started getting processed (in milliseconds relative to page navigation)
  • duration – total time to process the resource (in milliseconds)

The properties above are available for all resources – both same-origin and cross-origin. However, same-origin resources have additional properties available as defined by the PerformanceResourceTiming interface. They’re self-explanatory and occur pretty much in this chronological order:

  • redirectStart
  • redirectEnd
  • fetchStart
  • domainLookupStart
  • domainLookupEnd
  • connectStart
  • connectEnd
  • secureConnectionStart
  • requestStart
  • responseStart
  • responseEnd

Here’s the canonical processing model graphic that shows the different phases. Note that “duration” is equal to (responseEnd – startTime).

Take a look at my post Resource Timing Practical Tips for more information on how to use Resource Timing.

Unexpected blocking bloat in “duration”

The detailed PerformanceResourceTiming properties are restricted to same-origin resources for privacy reasons. (Note that any resource can be made “same-origin” by using the Timing-Allow-Origin response header.) About half of the resources on today’s websites are cross-origin, so “duration” is the only way to measure their load time. And even for same-origin resources, “duration” is the only delta provided, presumably because it’s the most important phase to measure. As a result, all of the Resource Timing implementations I’ve seen use “duration” as the primary performance metric.

Unfortunately, “duration” is more than download time. It also includes “blocking time” - the delay between when the browser realizes it needs to download a resource to the time that it actually starts downloading the resource. Blocking can occur in several situations. The most typical is when there are more resources than TCP connections. Most browsers only open 6 TCP connections per hostname, the exceptions being IE10 (8 connections), and IE11 (12 connections).

This Resource Timing blocking test page has 16 images, so some images incur blocking time no matter which browser is used. Each of the images is programmed on the server to have a 1 second delay. The “startTime” and “duration” are displayed for each of the 16 images. Here are WebPagetest results for this test page being loaded in ChromeIE10, and IE11. You can look at the screenshots to read the timing results. Note how “startTime” is approximately the same for all images. That’s because this is the time that the browser parsed the IMG tag and realized it needed to download the resource. But the “duration” values increase in steps of ~1 second for the images that occur later in the page. This is because they are blocked from downloading by the earlier images.

In Chrome, for example, the images are downloaded in three sets – because Chrome only downloads 6 resources at a time. The first six images have a “duration” of ~1.3 seconds (the 1 second backend delay plus some time for establishing the TCP connections and downloading the response body). The next six images have a “duration” of ~2.5 seconds. The last four images have a “duration” of ~3.7 seconds. The second set is blocked for ~1 second waiting for the first set to finish. The third set is blocked for ~2 seconds waiting for sets 1 & 2 to finish.

Even though the “duration” values increase from 1 to 2 to 3 seconds, the actual download time for all images is ~1 second as shown by the WebPagetest waterfall chart.

The results are similar for IE10 and IE11. IE10 starts with six parallel TCP connections but then ramps up to eight connections. IE11 also starts with six parallel TCP connections but then ramps up to twelve. Both IE10 and IE11 exhibit the same problem – even though the load time for every image is ~1 second, “duration” shows values ranging from 1-3 seconds.

proposal: “networkDuration”

Clearly, “duration” is not an accurate way to measure resource load times because it may include blocking time. (It also includes redirect time, but that occurs much less frequently.) Unfortunately, “duration” is the only metric available for cross-origin resources. Therefore, I’ve submitted a proposal to the W3C Web Performance mailing list to add “networkDuration” to Resource Timing. This would be available for both same-origin and cross-origin resources. (I’m flexible about the name; other candidates include “networkTime”, “loadTime”, etc.)

The calculation for “networkDuration” is as follows. (Assume ”r” is a PerformanceResourceTiming object.)

dns = r.domainLookupEnd - r.domainLookupStart;
tcp = r.connectEnd - r.connectStart; // includes ssl negotiation
waiting = r.responseStart - r.requestStart;
content = r.responseEnd - r.responseStart;
networkDuration = dns + tcp + waiting + content;

Developers working with same-origin resources can do the same calculations as shown above to derive “networkDuration”. However, providing the result as a new attribute simplifies the process. It also avoids possible errors as it’s likely that companies and teams will compare these values, so it’s important to ensure an apples-to-apples comparison. But the primary need for “networkDuration” is for cross-origin resources. Right now, “duration” is the only metric available for cross-origin resources. I’ve found several teams that were tracking “duration” assuming it meant download time. They were surprised when I explained that it also including blocking time, and agreed it was not the metric they wanted; instead they wanted the equivalent of “networkDuration”.

I mentioned previously that the detailed time values (domainLookupStart, connectStart, etc.) are restricted to same-origin resources for privacy reasons. The proposal to add “networkDuration” is likely to raise privacy concerns; specifically that by removing blocking time, “networkDuration” would enable malicious third party JavaScript to determine whether a resource was read from cache. However, it’s possible to remove blocking time today using “duration” by loading a resource when there’s no blocking contention (e.g., after window.onload). Even when blocking time is removed, it’s ambiguous whether a resource was read from cache or loaded over the network. Even a cache read will have non-zero load times.

The problem that “networkDuration” solves is finding the load time for more typical resources that are loaded during page creation and might therefore incur blocking time.

Takeaway

It’s not possible today to use Resource Timing to measure load time for cross-origin resources. Companies that want to measure load time and blocking time can use “duration”, but all the companies I’ve spoken with want to measure the actual load time (without blocking time). To provide better performance metrics, I encourage the addition of “networkDuration” to the Resource Timing specification. If you agree, please voice your support in a reply to my “networkDuration” proposal on the W3C Web Performance mailing list.

 

2 Comments

Request Timeout

November 14, 2014 3:15 am | 2 Comments

With the increase in 3rd party content on websites, I’ve evangelized heavily about how Frontend SPOF blocks the page from rendering. This is timely given the recent Doubleclick outage. Although I’ve been warning about Frontend SPOF for years, I’ve never measured how long a hung response blocks rendering. I used to think this depended on the browser, but Pat Meenan recently mentioned he thought it depended more on the operating system. So I decided to test it.

My test page contains a request for a script that will never return. This is done using Pat’s blackhole server. Eventually the request times out and the page will finish loading. Thus the amount of time this takes is captured by measuring window.onload. I tweeted asking people to run the test and collected the results in a Browserscope user test.

The aggregated results show the median timeout value (in seconds) for each type of browser. Unfortunately, this doesn’t reflect operating system. Instead, I exported the raw results and did some UA parsing to extract an approximation for OS. The final outcome can be found in this Google Spreadsheet of Blackhole Request Timeout values.

Sorting this by OS we see that Pat was generally right. Here are median timeout values by OS:

  • Android: ~60 seconds
  • iOS: ~75 seconds
  • Mac OS: ~75 seconds
  • Windows: ~20 seconds

The timeout values above are independent of browser. For example, on Mac OS the timeout value is ~75 seconds for Chrome, Firefox, Opera, and Safari.

However, there are a lot of outliers. Ilya Grigorik points out that there are a lot of variables affecting when the request times out; in addition to browser and OS, there may be server and proxy settings that factor into the results. I also tested with my mobile devices and got different results when switching between carrier network and wifi.

The results of this test show that there are more questions to be answered. It would take someone like Ilya with extensive knowledge of browser networking to nail down all the factors involved. A general guideline is Frontend SPOF from a hung response ranges from 20 to 75 seconds depending on browser and OS.

2 Comments

do u webview?

October 9, 2014 3:52 am | 1 Comment

A “webview” is a browser bundled inside of a mobile application producing what is called a hybrid app. Using a webview allows mobile apps to be built using Web technologies (HTML, JavaScript, CSS, etc.) but still package it as a native app and put it in the app store. In addition to allowing devs to work with familiar technologies, other advantages of building a hybrid app include greater code reuse across the app and the website, and easier support for multiple mobile platforms.

We all have webview traffic

Deciding whether to build a hybrid app versus a native app, or to have an app at all, is a lengthy debate and not the point of this post. Even if you don’t have a hybrid app, a significant amount of your mobile traffic comes from webviews. That’s because many sources of traffic are hybrid apps. Two examples on iOS are the Facebook app and Google Chrome. “Whoa, whoa, whoa” you say, Facebook’s retreat from its hybrid app is well known. That’s true. The Facebook timeline, for example, is no longer rendered using a webview:

Facebook timeline

However, the Facebook timeline contains links, such as the link to http://www.guggenheim.org/ in the timeline above. When users click on links in the timeline, the Facebook app opens those in a webview:

Facebook webview

Similarly, Chrome for iOS is implemented using a webview. Across all iOS traffic, 6% comes from Facebook’s webview and 5% comes from Google Chrome according to ScientiaMobile. And there are other examples: Twitter’s iOS app uses a webview to render clicked links, etc.

I encourage you to scan your server logs to gauge how much of your mobile traffic comes from webviews. There’s not much documentation on webview User-Agent strings. For iOS, the User-Agent is typically a base string with information appended by the app. Here’s the User-Agent string for Facebook’s webview:

Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_1 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Mobile/11D201 [FBAN/FBIOS;FBAV/12.1.0.24.20; FBBV/3214247; FBDV/iPhone6,1;FBMD/iPhone; FBSN/iPhone OS;FBSV/7.1.1; FBSS/2; FBCR/AT&T;FBID/phone;FBLC/en_US;FBOP/5]

Here’s the User-Agent string from Chrome for iOS:

Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) CriOS/37.0.2062.60 Mobile/11D257 Safari/9537.53

That’s a lot of detail. The bottom line is: we’re all getting more webview traffic than we expect. Therefore, it’s important that we understand how webviews perform and take that into consideration when building our mobile websites.

Webview performance

Since a webview is just a bundled browser, we might think that webviews and their mobile browser counterpart have similar performance profiles. It turns out that this is not the case. This was discovered as an unintentional side effect from the article iPhone vs. Android – 45,000 Tests Prove Who is Faster. This article from 2011, in the days of iOS 4.3, noted that the iPhone browser was 52% slower than Android’s. The results were so dramatic it triggered the following response from Apple:

[Blaze's] testing is flawed. They didn’t actually test the Safari browser on the iPhone. Instead they only tested their own proprietary app, which uses an embedded Web viewer that doesn’t actually take advantage of Safari’s Web performance optimizations.

Apple’s response is accurate. The study conducted by Blaze (now part of Akamai) was conducted using a webview, so it was not a true comparison of the mobile browser from each platform. But the more important revelation is that webviews were hobbled resulting in worse performance than mobile Safari. Specifically, the webview on iOS 4.3 did not have Nitro’s JIT compiler for JavaScript, application cache, nor asynchronous script loading.

This means it’s not enough to track the performance of mobile browsers alone; we also need to track the performance of webviews. This is especially true in light of the fact that more than 10% of iOS traffic comes from webviews. Luckily, the state of webviews is better than it was in 2011. Even better, the most recent webviews have significantly more features when it comes to performance. The following table compares the most recent iOS and Android webviews along a set of important performance features.

iOS 7
UIWebView
iOS 8
WKWebView
Android 4.3
Webkit
Webview
Android 4.4
Chromium
Webview
Nitro/V8
html5test.com 410 440 278 434
localStorage
app cache
indexedDB
SPDY
WebP
srcset ?
WebGL ?
requestAnimation- Frame
Nav Timing
Resource Timing

As shown in this table, the newest webviews have dramatically better performance. The most important improvement is JIT compilation for JavaScript. While localStorage and app cache now have support across all webviews, the newer webviews add support for indexedDB. Support for SPDY in the newer webviews is important to help mitigate the impact of slow mobile networks. WebP, image srcset, and WebGL address the bloat of mobile images, but support for these features is mixed. (I wasn’t able to confirm the status of srcset and WebGL in Android 4.4′s webview. Please add comments and I’ll update the table.) The requestAnimationFrame API gives smoother animations. Finally, adoption of the Nav Timing and Resource Timing APIs gives website owners the ability to track performance for websites served inside webviews.

Not out of the woods yet

While the newest webviews have a better performance profile, we’re still on the hook for supporting older webviews. Hybrid apps will continue to use the older webviews until they’re rebuilt and updated. The Android webview is pinned at Chromium 30 and requires an OS upgrade to get feature updates. Similar to the issues with legacy browsers, traffic from legacy webviews will continue for at least a year. Given the significant amount of traffic from webviews and the dramatic differences in webview performance, it’s important that developers measure performance on old and new webviews, and apply mobile performance best practices to make their website as fast as possible even on old webviews.

(Many thanks to Maximiliano Firtman, Tim Kadlec, Brian LeRoux, and Guy Podjarny for providing information for this post.)

1 Comment

Onload in Onload

September 12, 2014 7:58 am | 6 Comments

or “Why you should use document.readyState”

I asked several web devs what happens if an onload handler adds another onload handler. Does the second onload handler execute?

The onload event has already fired, so it might be too late for the second onload to get triggered. On the other hand, the onload phase isn’t over (we’re between loadEventStart and loadEventEnd in Navigation Timing terms), so there might be a chance the second onload handler could be added to a queue and executed at the end.

None of the people I asked knew the answer, but we all had a guess. I’ll explain in a minute why this is important, but until then settle on your answer – do you think the second onload executes?

To answer this question I created the Onload in Onload test page. It sets an initial onload handler. In that first onload handler a second onload handler is added. Here’s the code:

function addOnload(callback) {
    if ( "undefined" != typeof(window.attachEvent) ) {
        return window.attachEvent("onload", callback);
    }
    else if ( window.addEventListener ){
        return window.addEventListener("load", callback, false);
    }
}

function onload1() {
    document.getElementById('results').innerHTML += "First onload executed.";
    addOnload(onload2);
}

function onload2() {
    document.getElementById('results').innerHTML += "Second onload executed.";
}

addOnload(onload1);

I created a Browserscope user test to record the results and tweeted asking people to run the test. Thanks to crowdsourcing we have results from dozens of browsers. So far no browser executes the second onload handler.

Why is this important?

There’s increasing awareness of the negative impact scripts have on page load times. Many websites are following the performance best practice of loading scripts asynchronously. While this is a fantastic change that makes pages render more quickly, it’s still possible for an asynchronous script to make pages slower because onload doesn’t fire until all asynchronous scripts are done downloading and executing.

To further mitigate the negative performance impact of scripts, some websites have moved to loading scripts in an onload handler. The problem is that often the scripts being moved to the onload handler are third party scripts. Combine this with the fact that many third party scripts, especially metrics, kickoff their execution via an onload handler. The end result is we’re loading scripts that include an onload handler in an onload handler. We know from the test results above that this results in the second onload handler not being executed, which means the third party script won’t complete all of its functionality.

Scripts (especially third party scripts) that use onload handlers should therefore check if the onload event has already fired. If it has, then rather than using an onload handler, the script execution should start immediately. A good example of this is my Episodes RUM library. Previously I initiated gathering of the RUM metrics via an onload handler, but now episodes.js also checks document.readyState to ensure the metrics are gathered even if onload has already fired. Here’s the code:

    if ( "complete" == document.readyState ) {
        // The page is ALREADY loaded - start EPISODES right now.
        if ( EPISODES.autorun ) {
            EPISODES.done();
        }
    }
    else {
        // Start EPISODES on onload.
        EPISODES.addEventListener("load", EPISODES.onload, false);
    }

Summing up:

  • If you own a website and want to make absolutely certain a script doesn’t impact page load times, consider loading the script in an onload handler. If you do this, make sure to test that the delayed script doesn’t rely on an onload handler to complete its functionality. (Another option is to load the script in an iframe, but third party scripts may not perform correctly from within an iframe.)
  • If you own a third party script that adds an onload handler, you might want to augment that by checking document.readyState to make sure onload hasn’t already fired.

6 Comments

Resource Timing practical tips

August 21, 2014 12:45 am | 9 Comments

The W3C Web Performance Working Group brought us Navigation Timing in 2012 and it’s now available in nearly every major browser. Navigation Timing defines a JavaScript API for measuring the performance of the main page. For example:

// Navigation Timing
var t = performance.timing,
    pageloadtime = t.loadEventStart - t.navigationStart,
    dns = t.domainLookupEnd - t.domainLookupStart,
    tcp = t.connectEnd - t.connectStart,
    ttfb = t.responseStart - t.navigationStart;

Having timing metrics for the main page is great, but to diagnose real world performance issues it’s often necessary to drill down into individual resources. Thus, we have the more recent Resource Timing spec. This JavaScript API provides similar timing information as Navigation Timing but it’s provided for each individual resource. An example is:

// Resource Timing
var r0 = performance.getEntriesByType("resource")[0],
    loadtime = r0.duration,
    dns = r0.domainLookupEnd - r0.domainLookupStart,
    tcp = r0.connectEnd - r0.connectStart,
    ttfb = r0.responseStart - r0.startTime;

As of today, Resource Timing is supported in Chrome, Chrome for Android, Opera, IE10, and IE11. This is likely more than 50% of your traffic so should provide enough data to uncover the slow performing resources in your website.

Using Resource Timing seems straightforward, but when I wrote my first production-quality Resource Timing code I ran into several issues that I want to share. Here are my practical tips for tracking Resource Timing metrics in the real world.

1. Use getEntriesByType(“resource”) instead of getEntries().

You begin using Resource Timing by getting the set of resource timing performance objects for the current page. Many Resource Timing examples use performance.getEntries() to do this, which implies that only resource timing objects are returned by this call. But getEntries() can potentially return four types of timing objects: “resource”, “navigation”, “mark”, and “measure”.

This hasn’t caused problem for developers so far because “resource” is the only entry type in most pages. The “navigation” entry type is part of Navigation Timing 2, which isn’t implemented in any browser AFAIK. The “mark” and “measure” entry types are from the User Timing specification which is available in some browsers but not widely used.

In other words, getEntriesByType("resource") and getEntries() will likely return the same results today. But it’s possible that getEntries()  will soon return a mix of performance object types. It’s best to use performance.getEntriesByType("resource") so you can be sure to only retrieve resource timing objects. (Thanks to Andy Davies for explaining this to me.)

2. Use Nav Timing to measure the main page’s request.

When fetching a web page there is typically a request for the main HTML document. However, that resource is not returned by performance.getEntriesByType("resource"). To get timing information about the main HTML document you need to use the Navigation Timing object (performance.timing).

Although unlikely, this could cause bugs when there are no entries. For example, my earlier Resource Timing example used this code:

performance.getEntriesByType("resource")[0]

If the only resource for a page is the main HTML document, then getEntriesByType("resource") returns an empty array and referencing element [0] results in a JavaScript error. If you don’t have a page with zero subresources then you can test this on http://fast.stevesouders.com/.

3. Beware of issues with secureConnectionStart.

The secureConnectionStart property allows us to measure how long it takes for SSL negotiation. This is important – I often see SSL negotiation times of 500ms or more. There are three possible values for secureConnectionStart:

  • If this attribute is not available then it must be set to undefined.
  • If HTTPS is not used then it must be set to zero.
  • If this attribute is available and HTTPS is used then it must be set to a timestamp.

There are three things to know about secureConnectionStart. First, in Internet Explorer the value of secureConnectionStart is always “undefined” because it’s not available (the value is buried down inside WinINet).

Second, there’s a bug in Chrome that causes secureConnectionStart to be incorrectly set to zero. If a resource is fetched using a pre-existing HTTPS connection then secureConnectionStart is set to zero when it should really be a timestamp. (See bug 404501 for the full details.) To avoid skewed data make sure to check that secureConnectionStart is neither undefined nor zero before measuring the time for SSL negotiation:

var r0 = performance.getEntriesByType("resource")[0];
if ( r0.secureConnectionStart ) {
    var ssl = r0.connectEnd - r0.secureConnectionStart;
}

Third, the spec is misleading with regard to this line: “…if the scheme of the current page is HTTPS, this attribute must return the time immediately before the user agent starts the handshake process…” (emphasis mine). It’s possible for the current page to be HTTP and still contain HTTPS resources for which we want to measure the SSL negotiation. The spec should be changed to “…if the scheme of the resource is HTTPS, this attribute must return the time immediately before the user agent starts the handshake process…”. Fortunately, browsers behave according to the corrected language, in other words, secureConnectionStart is available for HTTPS resources even on HTTP pages.

4. Add “Timing-Allow-Origin” to cross-domain resources.

For privacy reasons there are cross-domain restrictions on getting Resource Timing details. By default, a resource from a domain that differs from the main page’s domain has these properties set to zero:

  • redirectStart
  • redirectEnd
  • domainLookupStart
  • domainLookupEnd
  • connectStart
  • connectEnd
  • secureConnectionStart
  • requestStart
  • responseStart

There are some cases when it’s desirable to measure the performance of cross-domain resources, for example, when a website uses a different domain for its CDN (such as “youtube.com” using “s.ytimg.com”) and for certain 3rd party content (such as “ajax.googleapis.com”). Access to cross-domain timing details is granted if the resource returns the Timing-Allow-Origin response header. This header specifies the list of (main page) origins that are allowed to see the timing details, although in most cases a wildcard (“*”) is used to grant access to all origins. As an example, the Timing-Allow-Origin response header for http://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js is:

Timing-Allow-Origin: *

It’s great when 3rd parties add this response header; it allows website owners to measure and understand the performance of that 3rd party content on their pages. As reported by (and thanks to) Ilya Grigorik, several 3rd parties have added this response header. Here are some example resources that specify “Timing-Allow-Origin: *”:

When measuring Resource Timing it’s important to determine if you have access to the restricted timing properties. You can do that by testing whether any of the restricted properties (listed above) (except for secureConnectionStart) is zero. I always use requestStart. Here’s a better version of the earlier code snippet that checks that the restricted properties are available before calculating the more detailed performance metrics:

// Resource Timing
var r0 = performance.getEntriesByType("resource")[0],
    loadtime = r0.duration;
if ( r0.requestStart ) {
    var dns = r0.domainLookupEnd - r0.domainLookupStart,
        tcp = r0.connectEnd - r0.connectStart,
        ttfb = r0.responseStart - r0.startTime;
}
if ( r0.secureConnectionStart ) {
    var ssl = r0.connectEnd - r0.secureConnectionStart;
}

It’s really important that you do these checks. Otherwise, if it’s assumed you have access to these restricted properties you won’t actually get any errors; you’ll just get bogus data. Since the values are set to zero when access is restricted, things like “domainLookupEnd – domainLookupStart” translate to “0 – 0″ which returns a plausible result (“0″) but is not necessarily the true value of the (in this case) DNS lookup. This results in your metrics having an overabundance of “0″ values which incorrectly skews the aggregate stats to look rosier than they actually are.

5. Understand what zero means.

As mentioned in #4, some Resource Timing properties are set to zero for cross-domain resources that have restricted access. And again, it’s important to check for that state before looking at the detailed properties. But even when the restricted properties are accessible it’s possible that the metrics you calculate will result in a value of zero, and it’s important to understand what that means.

For example, (assuming there are no access restrictions) the values for domainLookupStart and domainLookupEnd are timestamps. The difference of the two values is the time spent doing a DNS resolution for this resource. Typically, there will only be one resource in the page that has a non-zero value for the DNS resolution of a given hostname. That’s because the DNS resolution is cached by the browser; all subsequent requests use that cached DNS resolution. And since DNS resolutions are cached across web pages, it’s possible to have all DNS resolution calculations be zero for an entire page. Bottomline: a zero value for DNS resolution means it was read from cache.

Similarly, the time to establish a TCP connection (“connectEnd – connectStart”) for a given hostname will be zero if it’s a pre-existing TCP connection that is being re-used. Each hostname should have ~6 unique TCP connections that should show up as 6 non-zero TCP connection time measurements, but all subsequent requests on that hostname will use a pre-existing connection and thus have a TCP connection time that is zero. Bottomline: a zero value for TCP connection means a pre-existing TCP connection was re-used.

The same applies to calculating the time for SSL negotiation (“connectEnd – secureConnectionStart”). This might be non-zero for up to 6 resources, but all subsequent resources from the same hostname will likely have a SSL negotiation time of zero because they use a pre-existing HTTPS connection.

Finally, if the duration property is zero this likely means the resource was read from cache.

6. Determine if 304s are being measured.

There’s another bug in Chrome stable (version 36) that has been fixed in version 37. This issue is going away, but since most users are on Chrome stable your current metrics are likely different than they are in reality. Here’s the bug: a cross-origin resource that does have Timing-Allow-Origin on a 200 response will not have that taken into consideration on a 304 response. Thus, 304 responses will have zeroes for all the restricted properties as shown by this test page.

This shouldn’t happen because the Timing-Allow-Origin header from the cached 200 response should be applied (by the browser) to the 304 response. This is what happens in Internet Explorer. (Try the test page in IE 10 or 11 to confirm.) (Thanks to Eric Lawrence for pointing this out.)

This impacts your Chrome Resource Timing results as follows:

  • If you’re checking that the restricted fields are zero (as described in #4), then you’ll skip measuring these 304 responses. That means you’re only measuring 200 responses. But 200 responses are slower than 304 responses, so your Resource Timing aggregate measurements are going to be larger than they are in reality.
  • If you’re not checking that the restricted fields are zero, then you’ll record many zero values which are faster than the 304 response was in reality, and your Resource Timing stats will be too optimistic.

There’s no easy way to avoid these biases, so it’s good news that the bug has been fixed. One thing you might try is to send a Timing-Allow-Origin on your 304 responses. Unfortunately, the popular Apache web server is unable to send this header in 304 responses (see bug 51223). Further evidence of the lack of Timing-Allow-Origin in 304 responses can be found by looking at the 3rd party resources listed in #4. As stated, it’s great that these 3rd parties return that header on 200 responses, but 4 out of 5 of them do not return Timing-Allow-Origin on 304 responses. Until Chrome 37 becomes stable it’s likely that Resource Timing metrics are skewed either too high or too low because of the missing details for the restricted properties. Fortunately, the value for duration is accurate regardless.

7. Look at Boomerang.

If you’re thinking of writing your own Resource Timing code you should first look at the Resource Timing plugin for Boomerang. (The code is on GitHub.) Boomerang is the popular open source RUM package maintained by Philip Tellis. He originally open sourced it when he was at Yahoo! but is now providing ongoing maintenance and enhancements as part of his work at SOASTA for the commercial version (mPulse). The code is clear, pithy, and robust and addresses many of the issues mentioned above.

In conclusion, Navigation Timing and Resource Timing are outstanding new specifications that give website owners much more visibility into the performance of their web pages. Resource Timing is the newer of the two specs, and as such there are still some wrinkles to be ironed out. These tips will help you get the most from your Resource Timing metrics. I encourage you to start tracking these metrics today to understand how your website is performing for the people who matter the most: real users.

Update: Here are some other 3rd party resources that include the Timing-Allow-Origin response header thus allowing website owners to measure their performance of this 3rd party content:

Update: Note that getEntriesByType(“resource”) and getEntries() do not include resources inside iframes. If the iframe is from the same origin, then the parent window can access these by using the iframe’s contentWindow.performance object.

9 Comments

Velocity highlights (video bonus!)

July 28, 2014 11:08 pm | Comments Off

We’re in the quiet period between Velocity Santa Clara and Velocity New York. It’s a good time to look back at what we saw and look forward to what we’ll see this September 15-17 in NYC.

Velocity Santa Clara was our biggest show to date. There was more activity across the attendees, exhibitors, and sponsors than I’d experienced at any previous Velocity. A primary measure of Velocity is the quality of the speakers. As always, the keynotes were livestreamed. The people who tuned in were not disappointed. I recommend reviewing all of the keynotes from the Velocity YouTube Playlist. All of them were great, but here were some of my favorites:

Virtual Machines, JavaScript and Assembler – Start. Here. Scott Hanselman’s walk through the evolution of the Web and cloud computing is informative and hilarious.
Lowering the Barrier to Programming – Pamela Fox works on the computer programming curriculum at Khan Academy. She also devotes time to Girl Develop It. This puts her in a good position to speak about the growing gap between the number of programmers and the number of programmer jobs, and how bringing more diversity into programming is necessary to close this gap.
Achieving Rapid Response Times in Large Online Services - Jeff Dean, Senior Fellow at Google, shares amazing techniques developed at Google for fast, scalable web services.
Mobile Web at Etsy – People who know Lara Swanson know the incredible work she’s done at Etsy building out their mobile platform. But it’s not all about technology. For a company to be successful it’s important to get cultural buy-in. Lara explains how Etsy achieved both the cultural and technical advances to tackle the challenges of mobile.
Build on a Bedrock of Failure – I want to end with this motivational cross-disciplinary talk from skateboarding icon Rodney Mullen. When you’re on the bleeding edge (such as skateboarding or devops), dealing with failure is a critical skill. Rodney talks about why people put themselves in this position, how they recover, and what they go on to achieve.

Now for the bonus! Some speakers have posted the videos of their afternoon sessions. These are longer, deeper talks on various topics. Luckily, some of the best sessions are available on YouTube:

Is TLS Fast Yet? – If you know performance then you know Ilya Grigorik. And if you know SPDY, HTTP/2, privacy, and security you know TLS is important. Here, the author of High Performance Browser Networking talks about how fast TLS is and what we can do to make it faster.
GPU and Web UI Performance: Building an Endless 60fps Scroller – Whoa! Whoa whoa whoa! Math?! You might not have signed up for it, but Diego Ferreiro takes us through the math and physics for smooth scrolling at 60 frames-per-second and his launch of ScrollerJS.
WebPagetest Power Users Part 1 and Part 2 – WebPagetest is one of the best performance tools out there. Pat Meenan, creator of WebPagetest, guides us through the new and advanced features.
Smooth Animation on Mobile Web, From Kinetic Scrolling to Cover Flow Effect – Ariya Hidayat does a deep dive into the best practices for smooth scrolling on mobile.
Encouraging Girls in IT: A How To Guide - Doug Ireton and his 7-year-old daughter, Jane Ireton, lament the lack of women represented in computer science and Jane’s adventure learning programming.

If you enjoy catching up using video, I recommend you watch these and other videos from the playlist. If you’re more of the “in-person” type, then I recommend you register for Velocity New York now. While you’re there, use my STEVE25 discount code for 25% off. I hope to see you in New York!

Comments Off

HTTP Archive – new stuff!

June 8, 2014 10:11 pm | 8 Comments

Background

The HTTP Archive crawls the world’s top 300K URLs twice each month and records detailed information like the number of HTTP requests, the most popular image formats, and the use of gzip compression. We also crawl the top 5K URLs on real iPhones as part of the HTTP Archive Mobile. In addition to aggregate stats, the HTTP Archive has the same set of data for individual websites plus images and video of the site loading.

I started the project in 2010 and merged it into the Internet Archive in 2011. The data is collected using WebPagetest. The code and data are open source. The hardware, mobile devices, storage, and bandwidth are funded by our generous sponsors:  GoogleMozillaNew RelicO’Reilly MediaEtsyRadwaredynaTrace SoftwareTorbitInstart Logic, and Catchpoint Systems.

For more information about the HTTP Archive see our About page.

New Stuff!

I’ve made a lot of progress on the HTTP Archive in the last two months and want to share the news in this blog post.

Github
A major change was moving the code to Github. It used to be on Google Code but Github is more popular now. There have been several contributors over the years, but I hope the move to Github increases the number of patches contributed to the project.
Histograms
The HTTP Archive’s trending charts show the average value of various stats over time. These are great for spotting performance regressions and other changes in the Web. But often it’s important to look beyond the average and see more detail about the distribution. As of today all the relevant trending charts have a corresponding histogram. For an example, take a look at the trending chart for Total Transfer Size & Total Requests and its corresponding histograms. I’d appreciate feedback on these histograms. In some cases I wonder if a CDF would be more appropriate.
Connections
We now plot the number of TCP connections that were used to load the website. (All desktop stats are gathered using IE9.) Currently the average is 37 connections per page.  
CDNs
We now record the CDN, if any, for each individual resource. This is currently visible in the CDNs section for individual websites. The determination is based on a reverse lookup of the IP address and a host-to-CDN mapping in WebPagetest.

Custom Metrics

Pat Meenan, the creator of WebPagetest, just added a new feature called custom metrics for gathering additional metrics using JavaScript. The HTTP Archive uses this feature to gather these additional stats:

Average DOM Depth
The complexity of the DOM hierarchy affects JavaScript, CSS, and rendering performance. I first saw average DOM depth as a performance metric in DOM Monster. I think it’s a good stat for tracking DOM complexity. 
Document Height
Web pages are growing in many ways such as total size and use of fonts. I’ve noticed pages also getting wider and taller so HTTP Archive now tracks document height and width. You can see the code here. Document width is more constrained by the viewport of the test agent and the results aren’t that interesting, so I only show document height. 
localStorage & sessionStorage
The use of localStorage and sessionStorage can help performance and offline apps, so the HTTP Archive tracks both of these. Right now the 95th percentile is under 200 characters for both, but watch these charts over the next year. I expect we’ll see some growth.
Iframes
Iframes are used frequently to contain third party content. This will be another good trend to watch.
SCRIPT Tags
The HTTP Archive has tracked the number of external scripts since its inception, but custom metrics allows us to track the total number of SCRIPT tags (inline and external) in the page.
Doctype
Specifying a doctype affects quirks mode and other behavior of the page. Based on the latest crawl, 14% of websites don’t specify a doctype, and “html” is the most popular doctype at 40%. Here are the top five doctypes.

Doctype Percentage
html 40%
html -//W3C//DTD XHTML 1.0 Transitional//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd 31%
[none] 14%
html -//W3C//DTD XHTML 1.0 Strict//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd 8%
HTML -//W3C//DTD HTML 4.01 Transitional//EN http://www.w3.org/TR/html4/loose.dtd 3%

Some of these new metrics are not yet available in the HTTP Archive Mobile but we’re working to add those soon. They’re available as histograms currently, but once we have a few months of data I’ll add trending charts, as well.

What’s next?

Big ticket items on the HTTP Archive TODO list include:

  • easier private instance - I estimate there are 20 private instances of HTTP Archive out there today (see here, here, here, here, and here). I applaud these folks because the code and documentation don’t make it easy to setup a private instance. There are thousands of WebPagetest private instances in the world. I feel that anyone running WebPagetest on a regular basis would benefit from storing and viewing the results in HTTP Archive. I’d like to lower the bar to make this happen.
  • 1,000,000 URLs – We’ve increased from 1K URLs at the beginning four years ago to 300K URLs today. I’d like to increase that to 1 million URLs on desktop. I also want to increase the coverage on mobile, but that’s going to probably require switching to emulators.
  • UI overhaul – The UI needs an update, especially the charts.

In the meantime, I encourage you to take a look at the HTTP Archive. Search for your website to see its performance history. If it’s not there (because it’s not in the top 300K) then add your website to the crawl. And if you have your own questions you’d like answered then try using the HTTP Archive dumps that Ilya Grigorik has exported to Google BigQuery and the examples from bigqueri.es.

8 Comments

MySQL dumps

May 8, 2014 9:19 pm | 2 Comments

As part of the HTTP Archive project, I create MySQL dumps for each crawl (on the 1st and 15th of each month). You can access the list of dumps from the downloads page. Several people use these dumps, most notably Ilya Grigorik who imports the data into Google BigQuery.

For the last year I’ve hesitated on many feature requests because they require schema changes. I wasn’t sure how changing the schema would affect the use of the dump files that preceded the change. This blog post summarizes my findings.

Format

When I started the HTTP Archive all the dumps were exported in MySQL format using a command like the following:

mysqldump --opt --skip-add-drop-table -u USERNAME -p -h SERVER DBNAME TABLENAME | gzip > TABLENAME.gz

These MySQL formatted dump files are imported like this:

gunzip -c TABLENAME.gz | mysql -u USERNAME -p -h SERVER DBNAME

People using databases other than MySQL requested that I also export in CSV format. The output of this export command is two files: TABLENAME.txt and TABLENAME.sql. The .txt file is CSV formatted and can be gzipped with a separate command.

mysqldump --opt --complete-insert --skip-add-drop-table -u USERNAME -p -h SERVER -T DIR DBNAME TABLENAME
gzip -c DIR/TABLENAME.txt > DIR/TABLENAME.csv.gz

This CSV dump is imported like this:

gunzip DIR/TABLENAME.csv.gz
mysqlimport --local --fields-optionally-enclosed-by="\"" --fields-terminated-by=, --user=USERNAME -p DBNAME DIR/TABLENAME.csv

The largest HTTP Archive dump file is ~25G unzipped and ~3G gzipped. This highlights a disadvantage of using CSV formatted dumps: there’s no way to gzip and ungzip in memory. This is because the mysqlimport command uses the filename to determine which table to use – if you piped in the rows then it wouldn’t know the table name. Unzipping a 25G file can be a challenge if disk space is limited.

On the other hand, the CSV import is ~30% faster than using the MySQL format file. This can save over an hour when importing 30 million rows. The HTTP Archive currently provides dumps in both MySQL and CVS format so people can choose between less disk space or faster imports.

Forward Compatibility

My primary concern is with the flexibility of previously-generated dump files in light of later schema changes – namely adding and dropping columns.

Dump files in MySQL format work fine with added columns. The INSERT commands in the dump are tied to specific column names, so the new columns are simply ignored. CSV formatted dumps are less flexible. The values in a row are stuffed into the table’s columns in order. If a new column is added at the end, everything works fine. But if a column is added in the middle of the existing columns, the row values will all shift one column to the left.

Neither format works well with dropped columns. MySQL formatted files will fail with an “unknown column” error. CSV formatted files will work but all the columns will be shifted, this time to the right.

Takeaways

I now feel comfortable making schema changes without invalidating the existing dump files provided I follow these guidelines:

  • don’t drop columns – If a column is no longer needed, I’ll leave it in place and modify the column definition to be a small size.
  • add columns at the end – I prefer to organize my columns semantically, but all new columns from this point forward will be added at the end.

I’ll continue to create dumps in MySQL and CSV format. These guidelines ensure that all past and future dump files will work against the latest schema.

 

 

 

 

2 Comments

Unexpected prerender in Chrome

April 30, 2014 2:55 pm | 12 Comments

Over the last week I’ve been investigating the cacheability of resources from Shopify. I would visit the page every few hours and track the number of 200 OK versus 304 Not Modified responses. To my surprise, Chrome’s Network tab indicated that almost all the responses were “from cache”.

This didn’t make sense. In many cases the resource URLs changed between test loads. How could a never-before-seen URL be “from cache”? In cases where the URL was the same, I noticed that the Date response header had changed from the previous test but Chrome still marked it “from cache”. How could the Date change without a 200 response status code?

I started thinking about my “Prebrowsing” work (blog post, slides, video). In my findings I talk about how browsers, especially Chrome, are doing more work in anticipation of what the user needs next. This proactive work includes doing DNS lookups, establishing TCP connections, downloading resources, and even prerendering entire pages.

Was it possible that Chrome was prerendering the entire page?

I started by looking at chrome://predictors. Given characters entered into Omnibox (the location field), this shows which URL you’re predicted to go to. In my tests, I had always typed the URL into the location bar, so the predictions for “shopify” could affect Chrome’s behavior in my tests. Here’s what I found in chrome://predictors:

chrome-predictors-shopify

Chrome predicted that if I entered “www.s” into the Omnibox I would end up going to “http://www.shopify.com/” with confidence 1.0 (as shown in the rightmost column). In fact, just typing “ww” had a 0.9 confidence of ending up on Shopify. In other words, Chrome had developed a deep history mapping my Omnibox keystrokes to the Shopify website, as indicated by rows colored green.

From my Prebrowsing research I knew that if the chrome://predictors confidence was high enough, Chrome would pre-resolve DNS and pre-connect TCP. Perhaps it was possible that Chrome was also proactively sending HTTP requests before they were actually needed. To answer this I opened Chrome’s Net panel and typed “www.s” in the Omnibox but never hit return. Instead, I just sat there and waited 10 seconds. But nothing showed up in Chrome’s Net panel:

shopify-omnibox

Suspecting that these background requests might not show up in Net panel, I fired up tcpdump and repeated the test – again only typing “www.s” and NOT hitting return. I uploaded the pcap file to CloudShark and saw 86 HTTP requests!

cloudshark-shopify

I looked at individual requests and saw that they were new URLs that had never been seen before but were in the HTML document. This confirmed that Chrome was prerendering the HTML document (as opposed to prefetching individual resources based on prior history). I was surprised that no one had discovered this before, so I went back to High Performance Networking in Google Chrome by Ilya Grigorik and scanned the Omnibox section:

the yellow and green colors for the likely candidates are also important signals for the ResourceDispatcher! If we have a likely candidate (yellow), Chrome may trigger a DNS pre-fetch for the target host. If we have a high confidence candidate (green), then Chrome may also trigger a TCP pre-connect once the hostname has been resolved. And finally, if both complete while the user is still deliberating, then Chrome may even pre-render the entire page in a hidden tab.

Takeaways

What started off as a quick performance analysis turned into a multi-day puzzler. The puzzle’s solution yields a few takeaways:

  • Remember that Chrome may do DNS prefetch, TCP pre-connect, and even prerender the entire page based on the confidences in chrome://predictors.
  • Not all HTTP requests related to a page or tab are shown in Chrome’s Net panel. I’d like to see this fixed, perhaps with an option to show these behind-the-scenes requests.
  • Ilya knows everything. Re-read his posts, articles, and book before running multi-day experiments.

 

12 Comments

fast, faster, Fastly

March 10, 2014 10:56 am | 16 Comments

Today I join Fastly as Chief Performance Officer. Read more in Fastly’s announcement.

I’m excited to get back to the world of startups. People who have known me for less than 15 years don’t know that side of me. In my early career I worked at Advanced Decision Systems, Helix Systems (co-founder), General Magic, WhoWhere?, and CoolSync (co-founder) – companies that ranged from 3 to 300 employees.

I went to Yahoo! with the intention of staying for a few years to have some good healthcare. I loved it so much I stayed for eight years working with people like Doug Crockford, Stoyan Stefanov, Nicholas Zakas, Nicole Sullivan, Tenni Thuerer, and Philip Tellis and many more incredible developers. It was there that Ash Patel and Geoff Ralston asked me to start a team focused on performance. As visionary executives, they believed there was a set of performance best practices that would be applicable to all the Yahoo! properties. It turned out those best practices apply to nearly every site on the Web.

Knowing Google’s culture of performance I was excited to continue my work there over the past six years. Google is an amazing company. I want to loudly thank Yahoo! and Google for giving me the opportunity to focus on web performance. They are responsible for sharing this web performance research, tools, and code with the web community. It’s important that we have companies like these to move the Web forward.

Many of Google’s performance projects, such as Chrome, SPDY, WebP, and Public DNS, focus on improving the Web’s infrastructure. These infrastructure projects have a dramatic impact – they help raise the tide for all websites. That’s half of the equation. The other half lies with how websites are built. Even on this fast infrastructure it’s still possible to build a slow website.

That’s why I’m excited to join Fastly where I’ll be able to engage with Fastly’s customers to produce websites that are blazingly fast. Fastly’s CDN platform is built with latest generation technology and software. I hope to add more performance to an already fast network, and make it go even fastlier.

16 Comments