Announcing the HTTP Archive
I’m proud to announce the release of the HTTP Archive. From the mission statement:
Successful societies and institutions recognize the need to record their history – this provides a way to review the past, find explanations for current behavior, and spot emerging trends. In 1996 Brewster Kahle realized the cultural significance of the Internet and the need to record its history. As a result he founded the Internet Archive which collects and permanently stores the Web’s digitized content.
In addition to the content of web pages, it’s important to record how this digitized content is constructed and served. The HTTP Archive provides this record. It is a permanent repository of web performance information such as size of pages, failed requests, and technologies utilized. This performance information allows us to see trends in how the Web is built and provides a common data set from which to conduct web performance research.
The HTTP Archive code is open source and the data is downloadable. Approximately 17,000 top websites are examined every two weeks. I started gathering the data in October 2010. The list of URLs is derived from various sources including Alexa, Fortune 500, and Quantcast. The system is built on the shoulders of WebPagetest which downloads each URL and gathers a HAR file, screenshots, video of the site loading, and other information. From this information the HTTP Archive extracts data, stores it in a database, aggregates the data, and provides various statistical analyses.
This chart is one of the many interesting stats from the HTTP Archive. It shows the number of bytes downloaded for the average web page broken out by content type.
In addition to interesting stats for an individual run, the HTTP Archive also has trending charts. For example, this chart shows the total transfer size of web pages has increased 88 kB (15%) over the last six month. (In order to compare apples-to-apples, since the list of top sites can change month to month, these comparisons are done using the intersection of ~1100 URLs analyzed in every run.)
The likely cause for this increase is size of images. The next chart shows that the total transfer size of images increased 61 kB (22%) over the same time period, even though the total number of image requests only increased by two (6%). Why did the size of images grow so much? This highlights one of the benefits of the HTTP Archive. Anyone who wants to answer that question can download the data and perform an analysis. Possible things to check include if the ratio of JPG images increased (since the average size of JPG images is 14 kB compared to 3 kB for GIF and 8 kB for PNG) and whether the increase occurs across most images in the page or a few larger images.
There are hours of slicing and dicing in store for performance engineers and anyone who loves data. About 20 charts are available today. More will come (add suggestions to the list of isues). The most important thing to remember is the data is being collected and archived, and will be accessible to new views that are added in the future.
Check out the FAQ for more information. I encourage you to join the HTTP Archive group in order to follow our progress and add to the discussion. Once there you’ll see the first email from Brewster Kahle who says “Fantastic! I have not seen such a wealth of information on the performance of the web recorded and so enriched.” I hope you too will find it worthwhile.
Storager case study: Bing, Google
Storager
Last week I posted my mobile comparison of 11 top sites. One benefit of analyzing top websites is finding new best practices. In that survey I found that the mobile version of Bing used localStorage to reduce the size of their HTML document from ~200 kB to ~30 kB. This is a good best practice in general and makes even more sense on mobile devices where latencies are higher, caches are smaller, and localStorage is widely supported.
I wanted to further explore Bing’s use of localStorage for better performance. One impediment is that there’s no visibility into localStorage on a mobile device. So I created a new bookmarklet, Storager, and added it to the Mobile Perf uber bookmarklet. (In other words, just install Mobile Perf – it bundles Storager and other mobile bookmarklets.)
Storager lets you view, edit, clear, and save localStorage for any web page on any browser – including mobile. Viewing localStorage on a 320×480 screen isn’t ideal, so I did the obvious next step and integrated Storager with Jdrop. With these pieces in place I’m ready to analyze how Bing uses localStorage.
Bing localStorage
My investigation begins by loading Bing on my mobile device – after the usual redirects I end up at the URL http://m.bing.com/?mid=10006. Opening Storager from the Mobile Perf bookmarklet I see that localStorage has ~10 entries. Since I’m not sure when these were written to localStorage I clear localStorage (using Storager) and hit reload. Opening Storager again I see the same ~10 entries and save those to Jdrop. I show the truncated entries below. I made the results public so you can also view the Storager results in Jdrop.
BGINFO: {"PortraitLink":"http://www.bing.com/fd/hpk2/Legzira_EN-US262...
CApp.Home.FD66E1A3: #ContentBody{position:relative;overflow:hidden;height:100%;-w...
CUX.Keyframes.B8625FE...: @-webkit-keyframes scaleout{from{-webkit-transform:scale3d(1,...
CUX.Site.18BDD936: *{margin:0;padding:0}table{border-collapse:separate;border-sp...
CUX.SiteLowRes.C8A1DA...: .blogoN{background-image:url(data:image/png;base64,iVBORw0KGg...
JApp.Home.DE384EBF: (function(){function a(){Type.registerNamespace("SS");SS.Home...
JUX.Compat.0907AAD4: function $(a){return document.getElementById(a)}var FireEvent...
JUX.FrameworkCore.A39...: (function(){function a(){Type.registerNamespace("BM");AjaxSta...
JUX.MsCorlib.172D90C3: window.ss={version:"0.6.1.0",isUndefined:function(a){return a...
JUX.PublicJson.540180...: if(!this.JSON)this.JSON={};(function(){function c(a){return a...
JUX.UXBaseControls.25...: (function(){function a(){Type.registerNamespace("UXControls")...
RMSM.Keys: CUX.Site.18BDD936~CUX.Keyframes.B8625FEE~CApp.Home.FD66E1A3~C...
These entries are written to localStorage as part of downloading the Bing search page. These entries add up to ~170 kB in size (uncompressed). This would explain the large size of the Bing HTML document on mobile. We can verify that these keys are downloaded via the HTML document by searching for a unique string from the data such as “FD66E1A3″. We find this string in the Bing document source (saved in Jdrop) as the id of a STYLE block:
<style data-rms="done" id="CApp.Home.FD66E1A3" rel="stylesheet" type="text/css">
#ContentBody{position:relative;overflow:hidden;height:100%;-webkit-tap-highlight-color:...
Notice how the content of this STYLE block matches the data in localStorage. The other localStorage entries also correspond to SCRIPT and STYLE blocks in the initial HTML document. Bing writes these blocks to localStorage and then on subsequent page views reads them back and inserts them into the document resulting in a much smaller HTML document download size. The Bing server knows which blocks are in the client’s localStorage via a cookie, where the cookie is comprised of the localStorage keys delimited by “~”:
RMSM=JApp.Home.DE384EBF~JUX.UXBaseControls.252CB7BF~JUX.FrameworkCore.A39F6425~ JUX.PublicJson.540180A4~JUX.Compat.0907AAD4~JUX.MsCorlib.172D90C3~CUX.SiteLowRes.C8A1DA4E~ CApp.Home.FD66E1A3~CUX.Keyframes.B8625FEE~CUX.Site.18BDD936~;
Just to be clear, everything above happens during the loading of the blank Bing search page. Once a query is issued the search results page downloads more keys (~95 kB additional data) and expands the cookie with the new key names.
Google localStorage
Another surprise from last week’s survey was that the mobile version of Google Search had 68 images in the results HTML document as data: URIs, compared to only 10 for desktop and iPad. Mobile browsers open fewer TCP connections and these connections are typically slower compared to desktop, so reducing the number of HTTP requests is important.
The additional size from inlining data: URIs doesn’t account for the large size of the Google Search results page, so perhaps localStorage is being seeded here, too. Using Storager we see over 130 entries in localStorage after a search for flowers. Here’s a sample. (As before, the key names and values may be truncated.)
mres.-8Y5Dw_nSfQztyYx: <style>a{color:#11c}a:visited{color:#551a8b}body{margin:0;pad...
mres.-Kx7q38gfNkQMtpx: <script> //<![CDATA[ var Zn={},bo=function(a,b){b&&Zn[b]||(ne...
mres.0kH3gDiUpLA5DKWN: <style>.zl9fhd{padding:5px 0 0}.sc59bg{clear:both}.pyp56b{tex...
mres.0thHLIQNAKnhcwR4: <style>.fdwkxt{width:49px;height:9px;background:url("data:ima...
mres.36ZFOahhhEK4t3WE: <script> //<

