Announcing the HTTP Archive

March 30, 2011 5:44 pm | 11 Comments

I’m proud to announce the release of the HTTP Archive. From the mission statement:

Successful societies and institutions recognize the need to record their history – this provides a way to review the past, find explanations for current behavior, and spot emerging trends. In 1996 Brewster Kahle realized the cultural significance of the Internet and the need to record its history. As a result he founded the Internet Archive which collects and permanently stores the Web’s digitized content.

In addition to the content of web pages, it’s important to record how this digitized content is constructed and served. The HTTP Archive provides this record. It is a permanent repository of web performance information such as size of pages, failed requests, and technologies utilized. This performance information allows us to see trends in how the Web is built and provides a common data set from which to conduct web performance research.

The HTTP Archive code is open source and the data is downloadable. Approximately 17,000 top websites are examined every two weeks. I started gathering the data in October 2010. The list of URLs is derived from various sources including Alexa, Fortune 500, and Quantcast. The system is built on the shoulders of WebPagetest which downloads each URL and gathers a HAR file, screenshots, video of the site loading, and other information. From this information the HTTP Archive extracts data, stores it in a database, aggregates the data, and provides various statistical analyses.

This chart is one of the many interesting stats from the HTTP Archive. It shows the number of bytes downloaded for the average web page broken out by content type.

In addition to interesting stats for an individual run, the HTTP Archive also has trending charts. For example, this chart shows the total transfer size of web pages has increased 88 kB (15%) over the last six month. (In order to compare apples-to-apples, since the list of top sites can change month to month, these comparisons are done using the intersection of ~1100 URLs analyzed in every run.)

The likely cause for this increase is size of images. The next chart shows that the total transfer size of images increased 61 kB (22%) over the same time period, even though the total number of image requests only increased by two (6%). Why did the size of images grow so much? This highlights one of the benefits of the HTTP Archive. Anyone who wants to answer that question can download the data and perform an analysis. Possible things to check include if the ratio of JPG images increased (since the average size of JPG images is 14 kB compared to 3 kB for GIF and 8 kB for PNG) and whether the increase occurs across most images in the page or a few larger images.

There are hours of slicing and dicing in store for performance engineers and anyone who loves data. About 20 charts are available today. More will come (add suggestions to the list of isues). The most important thing to remember is the data is being collected and archived, and will be accessible to new views that are added in the future.

Check out the FAQ for more information. I encourage you to join the HTTP Archive group in order to follow our progress and add to the discussion. Once there you’ll see the first email from Brewster Kahle who says “Fantastic! I have not seen such a wealth of information on the performance of the web recorded and so enriched.” I hope you too will find it worthwhile.

11 Comments

Storager case study: Bing, Google

March 28, 2011 9:26 pm | 23 Comments

Storager

Last week I posted my mobile comparison of 11 top sites. One benefit of analyzing top websites is finding new best practices. In that survey I found that the mobile version of Bing used localStorage to reduce the size of their HTML document from ~200 kB to ~30 kB. This is a good best practice in general and makes even more sense on mobile devices where latencies are higher, caches are smaller, and localStorage is widely supported.

I wanted to further explore Bing’s use of localStorage for better performance. One impediment is that there’s no visibility into localStorage on a mobile device. So I created a new bookmarklet, Storager, and added it to the Mobile Perf uber bookmarklet. (In other words, just install Mobile Perf – it bundles Storager and other mobile bookmarklets.)

Storager lets you view, edit, clear, and save localStorage for any web page on any browser – including mobile. Viewing localStorage on a 320×480 screen isn’t ideal, so I did the obvious next step and integrated Storager with Jdrop. With these pieces in place I’m ready to analyze how Bing uses localStorage.

Bing localStorage

My investigation begins by loading Bing on my mobile device – after the usual redirects I end up at the URL http://m.bing.com/?mid=10006. Opening Storager from the Mobile Perf bookmarklet I see that localStorage has ~10 entries. Since I’m not sure when these were written to localStorage I clear localStorage (using Storager) and hit reload. Opening Storager again I see the same ~10 entries and save those to Jdrop. I show the truncated entries below. I made the results public so you can also view the Storager results in Jdrop.

BGINFO: {"PortraitLink":"http://www.bing.com/fd/hpk2/Legzira_EN-US262...
CApp.Home.FD66E1A3: #ContentBody{position:relative;overflow:hidden;height:100%;-w...
CUX.Keyframes.B8625FE...: @-webkit-keyframes scaleout{from{-webkit-transform:scale3d(1,...
CUX.Site.18BDD936: *{margin:0;padding:0}table{border-collapse:separate;border-sp...
CUX.SiteLowRes.C8A1DA...: .blogoN{background-image:url(data:image/png;base64,iVBORw0KGg...
JApp.Home.DE384EBF: (function(){function a(){Type.registerNamespace("SS");SS.Home...
JUX.Compat.0907AAD4: function $(a){return document.getElementById(a)}var FireEvent...
JUX.FrameworkCore.A39...: (function(){function a(){Type.registerNamespace("BM");AjaxSta...
JUX.MsCorlib.172D90C3: window.ss={version:"0.6.1.0",isUndefined:function(a){return a...
JUX.PublicJson.540180...: if(!this.JSON)this.JSON={};(function(){function c(a){return a...
JUX.UXBaseControls.25...: (function(){function a(){Type.registerNamespace("UXControls")...
RMSM.Keys: CUX.Site.18BDD936~CUX.Keyframes.B8625FEE~CApp.Home.FD66E1A3~C...

These entries are written to localStorage as part of downloading the Bing search page. These entries add up to ~170 kB in size (uncompressed). This would explain the large size of the Bing HTML document on mobile. We can verify that these keys are downloaded via the HTML document by searching for a unique string from the data such as “FD66E1A3″. We find this string in the Bing document source (saved in Jdrop) as the id of a STYLE block:

<style data-rms="done" id="CApp.Home.FD66E1A3" rel="stylesheet" type="text/css">
#ContentBody{position:relative;overflow:hidden;height:100%;-webkit-tap-highlight-color:...

Notice how the content of this STYLE block matches the data in localStorage. The other localStorage entries also correspond to SCRIPT and STYLE blocks in the initial HTML document. Bing writes these blocks to localStorage and then on subsequent page views reads them back and inserts them into the document resulting in a much smaller HTML document download size. The Bing server knows which blocks are in the client’s localStorage via a cookie, where the cookie is comprised of the localStorage keys delimited by “~”:

RMSM=JApp.Home.DE384EBF~JUX.UXBaseControls.252CB7BF~JUX.FrameworkCore.A39F6425~
JUX.PublicJson.540180A4~JUX.Compat.0907AAD4~JUX.MsCorlib.172D90C3~CUX.SiteLowRes.C8A1DA4E~
CApp.Home.FD66E1A3~CUX.Keyframes.B8625FEE~CUX.Site.18BDD936~;

Just to be clear, everything above happens during the loading of the blank Bing search page. Once a query is issued the search results page downloads more keys (~95 kB additional data) and expands the cookie with the new key names.

Google localStorage

Another surprise from last week’s survey was that the mobile version of Google Search had 68 images in the results HTML document as data: URIs, compared to only 10 for desktop and iPad. Mobile browsers open fewer TCP connections and these connections are typically slower compared to desktop, so reducing the number of HTTP requests is important.

The additional size from inlining data: URIs doesn’t account for the large size of the Google Search results page, so perhaps localStorage is being seeded here, too. Using Storager we see over 130 entries in localStorage after a search for flowers. Here’s a sample. (As before, the key names and values may be truncated.)

 mres.-8Y5Dw_nSfQztyYx: <style>a{color:#11c}a:visited{color:#551a8b}body{margin:0;pad...
 mres.-Kx7q38gfNkQMtpx: <script> //<![CDATA[ var Zn={},bo=function(a,b){b&&Zn[b]||(ne...
 mres.0kH3gDiUpLA5DKWN: <style>.zl9fhd{padding:5px 0 0}.sc59bg{clear:both}.pyp56b{tex...
 mres.0thHLIQNAKnhcwR4: <style>.fdwkxt{width:49px;height:9px;background:url("data:ima...
 mres.36ZFOahhhEK4t3WE: <script> //<![CDATA[ var kk,U,lk;(function(){var a={};U=funct...
 mres.3lEpts5kTxnI2I5S: <script> //<![CDATA[ var Ec,Fc,Gc=function(a){this.Jl=a},Hc="...
 mres.4fbdvu9mdAaBINjE: <script> //<![CDATA[ u("_clOnSbt",function(){var a=document.g...
 mres.5QIb-AahnDgEGlYP: <script> //<![CDATA[ var cb=function(a){this.Cc=a},db=/\s*;\s...
 mres:time.-8Y5Dw_nSfQ...: 1301368541872
 mres:time.-Kx7q38gfNk...: 1301368542755
 mres:time.0kH3gDiUpLA...: 1301368542257
 mres:time.0thHLIQNAKn...: 1301368542223
 mres:time.36ZFOahhhEK...: 1301368542635
 mres:time.3lEpts5kTxn...: 1301368542579
 mres:time.4fbdvu9mdAa...: 1301368542720
 mres:time.5QIb-AahnDg...: 1301368542856

Searching the search results docsource for a unique key such as “8Y5D” we find:

<style id="r:-8Y5Dw_nSfQztyYx" type="text/css">
a{color:#11c}a:visited{color:#551a8b}body{margin:0;padding:0}...

Again we see that multiple SCRIPT and STYLE blocks are being saved to localStorage totaling 154 kB. On subsequent searches the HTML document size drops from the initial size of 220 kB uncompressed (74 kB compressed) to 67 kB uncompressed (16 kB compressed). In addition to the key names being saved in a cookie, it appears that an epoch time (in milliseconds) is associated with each key.

Conclusion

Bing and Google Search make extensive use of localStorage for stashing SCRIPT and STYLE blocks that are used on subsequent page views. None of the other top sites from my previous post use localStorage in this way. Are Bing and Google Search onto something? Yes, definitely. As I pointed out in my previous post, this is another example of a performance best practice that is used on a top mobile site but is not in the recommendations from Page Speed or YSlow. Many of the performance best practices that I’ve evangelized over the last six years for desktop apply to mobile, but I believe there are specific mobile best practices that we’re just beginning to identify. I’ve started using “High Performance Mobile” as the title of future presentations. Another book? hmmm….

23 Comments

Mobile comparison of Top 11

March 14, 2011 8:01 pm | 7 Comments

A few weeks ago I announced Jdrop while at Webstock. After Webstock I took some vacation time on Waiheke Island. Now that I’m back I wanted to show how Jdrop is useful for doing mobile analysis.

Preamble

As a reminder, Jdrop is a JSON repository in the cloud. Jdrop itself doesn’t do anything specific for mobile analysis. But what it does do is provide a way to gather information from mobile devices and analyze it later from your desktop where you have more screen real estate. A bigger screen is important when you’re looking at things like document source and lists of resources.

The apps that integrate with Jdrop are what actually gather the data and do the analysis. Right now there are three main apps integrated with Jdrop (all of them are bookmarklets): Page Resources, Docsource, and DOM Monster. Rather than add each of these bookmarklets individually to your mobile browser, you can get them all just by adding my uber Mobile Perf bookmarklet.

Actual Analysis

Using the Mobile Perf bookmarklet I analyzed 11 top U.S. sites on the desktop (Firefox 3.6), iPad, and iPhone. On the desktop and iPad I used my Mom’s home wifi; iPhone was over AT&T 3G.

For each site and platform I gathered the following information:

Overall Averages
reqs HTML elems
desktop 31 110 kB 1166
iPad 28 86 kB 908
iPhone 15 75 kB 567

The overall results are interesting. The iPhone versions of these websites are about half the size of their desktop counterparts in terms of number of requests and number of DOM elements, but are 75% when it comes to the size of the HTML document. The increased relative size of HTML documents is an interesting artifact I explore later in this post. The iPad averages fall in between desktop and iPhone – not a surprise given its bigger screen but limited networking and processing power.

A recent feature added to Jdrop is the ability to make data public. I gathered all of this data using Jdrop and made it public which allows me to link directly to the raw data so you can see it as part of the analysis that follows. You can take a look at the Public JSON page on Jdrop and use the filter to search for any other data you’re interested in.

Let the analysis begin.

Google Search

Google Search
reqs HTML elems
desktop 19 199 kB 1456
iPad 15 148 kB 1105
iPhone 6 149 kB 855

Doing a Google search for flowers shows it has versions customized for the iPad as well as the iPhone. The iPad version is about 75% in size compared with the desktop version. The iPhone version, on the other hand, is much smaller than the desktop version: 30% as many requests, 75% of the HTML, and 60% as many DOM elements.

It’s interesting that the ratio of requests between the iPhone and desktop versions is so small but the ratio of HTML size is so high. My first guess was data: URIs. Using data: URIs would reduce the number of requests but increase the size of the HTML document. Sure enough, the iPhone document source has 68 images specified as data: URIs, whereas the iPad and desktop versions only have 10.

This is a great example of how the relevance of performance best practices differs between desktop and mobile.

We’ve known about data: URIs as a performance best practice for years. I first blogged about them back in April of 2007. But neither Page Speed nor YSlow have a rule that recommends using data: URIs. And yet here they’re used extensively for mobile by one of the the most successful websites in the world. Data: URIs might not be a high priority on the desktop where latency is lower and browsers can open more simultaneous connections, but they’re worth another look if you’re trying to speedup your mobile website.

Bing

Bing
reqs HTML elems
desktop 19 93 kB 988
iPad 19 59 kB 889
iPhone 16 198 kB 374

The number of HTTP requests made for a Bing search for flowers is similar across desktop, iPad, and iPhone: 19, 19, and 16. The iPad version is slightly smaller than the desktop version in terms of HTML size (93kB vs. 59kB) and number of DOM elements (988 vs. 889).

The big surprise here is that the size of the iPhone HTML document. At 198kB it’s 2x bigger than the desktop version and 4x bigger than the iPad HTML size, even though its number of DOM elements is less than half of the other versions.

Why is the iPhone HTML document so much bigger than the desktop and iPad versions? This was a bit harder to figure out. My first guess was data: URIs, just like Google Search, but there aren’t any data: URIs in the Bing docsource for iPhone.

I had to unravel this one by looking at the actual response sent over the network. The easiest way to do that is to change my desktop browser user agent to iPhone. (I never do this the first time I’m analyzing a mobile website, but in this case I already have the docsource, page resources, and number of DOM elements, so I can make sure the version I get by changing my user agent matches what I originally received on my mobile device.) I was surprised to see that the size of the HTML document response is ~30K uncompressed (~9K compressed). So why does the Docsource bookmarklet report it as 198kB?

The Docsource bookmarklet uses the document.documentElement.innerHTML property. That property is ~198kB in length. So something is causing the ~30kB response to expand to ~198kB of HTML. Diffing the ~30kB network response with documentElement.innerHTML highlights calls to the RMS._load function:

_load:function(b){
    var a=null;
    if(localStorage)
        try{a=localStorage.getItem(b)}
    catch(c){}
    return a
}

Sure enough, if you clear your localStorage and repeat the search the network transfer size of the HTML document source is ~200kB. Some of the blocks of that HTML document are saved to localStorage so they can be retrieved from there on subsequent searches thus reducing the size of the HTML document. Very nice! You can study the Bing response and see how they structured various STYLE blocks to make it easy to extract that data and save it to localStorage.

Yahoo Search

Yahoo Search
reqs HTML elems
desktop 24 174 kB 2221
iPad 25 83 kB 967
iPhone 25 38 kB 387

In terms of HTML size and number of DOM elements, the iPad and iPhone versions of a Yahoo search for flowers are progressively smaller than the desktop version. The iPad version is about half of the desktop version, and the iPhone is less than a quarter of the desktop version.

Although the iPad and iPhone versions are progressively smaller than the desktop version in terms of HTML size and number of DOM elements, all three versions have about 24-25 HTTP requests. This is counter intuitive – one would think that the smaller screen sizes would have fewer requests. The bulk of these requests are images: the desktop page resources has 17 images, the iPad has 19 images, and iPhone has 15 images. So the iPhone does have fewer images, and those images are generally smaller.

The big difference comes in terms of CSS background images. Both the desktop and iPad versions have 1 CSS background image, compared with 7 for the iPhone version. The iPhone’s CSS background images contain two named “core-spring.png”. They have different version numbers (v=9 and v=12), but are similar. Two of the background images (“os_mag_glass.png” and “arrow_right2.png”) have similar images in “core-sprite.png”. Getting the iPhone down to 1 CSS background image, like the desktop and iPad versions, would eliminate 6 HTTP requests.

Wikipedia

Wikipedia
reqs HTML elems
desktop 47 121 kB 1642
iPad 47 121 kB 1632
iPhone 27 83 kB 1108

Wikipedia’s “flowers” page is identical on the desktop and iPad. The iPhone version is smaller in all three categories. The number of requests on the iPhone is high – 27 compared to the average of 15 requests. Most of these are large images that aren’t visible in the initial page, but they are loaded before the window onload event. This isn’t a huge deal, but it does make page load measurements longer and can block more important resources like scripts at the bottom of the page. It would be better to load these images dynamically after the onload event.

Facebook

Facebook
reqs HTML elems
desktop 16 31 kB 524
iPad 16 29 kB 519
iPhone 4 2 kB 114

The Facebook login page was tested instead of a logged-in page to avoid differences based on user account. The desktop and iPad versions are almost identical. The iPhone version, on the other hand, is less than a quarter of the size of the others. How did they achieve such a reduction of HTTP requests – from 16 down to 4?

Comparing the desktop version’s page resources to the iPhone version’s page resources shows that the iPhone version is much lighter. The desktop version has 4 scripts, 5 stylesheets, 2 images, and 4 CSS background images. The iPhone version has 2 stylesheets and 1 image (plus the HTML document) – that’s it. If you visually compare the desktop and iPhone versions you’ll confirm that the iPhone UI is much simpler, thus explaining the big drop in number of requests.

Twitter

Twitter
reqs HTML elems
desktop 45 57 kB 873
iPad 61 55 kB 886
iPhone 3 12 kB 185

Twitter’s login page is about the same on desktop and iPad. The iPhone version is significantly smaller, especially in terms of number of HTTP requests. A visual comparison shows that the iPhone UI is much simpler than the desktop UI. Not only is the number of HTTP requests dramatically smaller, but the HTML document size is much smaller, as well. There are no data: URIs or calls to localStorage in the iPhone HTML, which keeps the HTML size so small.

Amazon

Amazon
reqs HTML elems
desktop 55 135 kB 1157
iPad 49 134 kB 1157
iPhone 16 11 kB 247

Amazon’s front page is the same on desktop and iPad. The iPhone version is significantly smaller:1/3 as many requests, an HTML document 1/10 the size, and 1/5 the number of DOM elements.

Although the iPhone version is much smaller, the number of HTTP requests could be reduced. The iPhone page resources shows it has 8 CSS background images. Running SpriteMe (yes, you can even run SpriteMe on mobile devices!) shows 3 of these could be eliminated by using sprites. And perhaps the logo, gold chest, and cart could be sprited, as well. A sprite image exists in the desktop page resources that contains the logo, cart, gold chest, and some of the gradients and arrows. Creating a similar sprite for mobile could drop the number of requests from 16 to 11.

eBay

eBay
reqs HTML elems
desktop 39 69 kB 934
iPad 39 64 kB 908
iPhone 22 24 kB 304

eBay’s front page is similar to Amazon in that the desktop and iPad versions are identical. The iPhone version, on the other hand, is 1/2 to 1/3 the size. The size of the HTML document and the number of DOM elements is below the average (which is good), but the number of HTTP requests is higher than average (not so good).

In the 16 HTTP requests in the iPhone page resources there are10 images and 4 CSS background images. Of these, 4-8 look like good candidates for spriting. Alternatively, many of these images are small and could be inlined as data: URIs.

Craigslist

Craigslist
reqs HTML elems
desktop 4 32 kB 1121
iPad 4 32 kB 1115
iPhone 4 32 kB 1115

Craigslist is the same across all three platforms: desktop, iPad, and iPhone. The page is light so performance is not an issue, but the amount of content is visually overwhelming on the iPhone’s small screen.

(Just a funny side note: I tested this from my Mom’s house in North Carolina. Craigslist redirected me to the version for Jacksonville, NC which for some reason has a domain name based on the nearby town of Onslow. I just thought it was funny that my performance testing got sent to a URL starting with “onslow”.)

Yahoo Front Page

Yahoo
reqs HTML elems
desktop 42 216 kB 1013
iPad 22 33 kB 473
iPhone 39 106 kB 1254

There are a few curiosities when comparing the desktop and iPhone versions of the Yahoo front page. The iPhone version has about the same number of HTTP requests and the HTML document is half the size, but the number of DOM elements is greater by ~20%. Although the iPhone version has fewer scripts, stylesheets, and CSS background images, it has 31 images compared to just 16 images in the desktop version. Many of these images are downloaded after the window onload event. It appears that they are being prefetched – they’re not visible on the screen so perhaps are used on subsequent pages.

The iPad version is dramatically smaller than both the desktop and iPhone versions. This is the only website where the iPad version is smaller than the iPhone version. Why would the iPad version be whittled down so much smaller than the iPhone version? The UI is very different. Perhaps there’s less prefetching happening, with less certainty of where the user may go next.

Youtube

Youtube
reqs HTML elems
desktop 29 82 kB 895
iPad 15 193 kB 337
iPhone 5 193 kB 297

There’s something interesting going on with the Youtube page. While the number of HTTP requests and DOM elements gets progressively smaller, the size of the HTML document on the iPad and iPhone is more than 2x larger than that of the desktop version. The iPhone version uses 6 data: URIs compared to the desktop version, but the biggest cause of the larger HTML document on iPhone is inlining JavaScript instead of fetching it from external scripts. Similar to the data: URI technique, this reduces the number of HTTP requests by embedding the resources inside the HTML document.

Conclusion

I hope you’ve found these mobile observations informative, but even more I hope you’ve seen that Jdrop is a useful mechanism for gathering, analyzing, and sharing mobile performance data. Take it for a spin, and if you have a tool, especially a bookmarklet, that does performance analysis contact me about integrating it with Jdrop.

7 Comments