HTTP Archive: today’s runs
I just finished processing the “May 16 2011″ run for HTTP Archive and HTTP Archive Mobile. Here are some interesting observations.
HTTP Archive (desktop)
As of today there is six months of historical data in the HTTP Archive. As a reminder, the world’s top ~17K web pages are being crawled. Since the actual URLs in the list at any given time can change, I start by looking at the trends for the intersection of URLs. This means the list of URLs is exactly the same for each data point. The total transfer size continues to grow: up 8 kB (1.1%) since the last run on Apr 30 2011, and up 53 kB (8%) from the first run back on Nov 16 2010.
Looking at the Image Transfer Size chart we see images are the main cause for this growth – up 5 kB from Apr 30 2011 and 43 kB since Nov 16 2010. Conversely, the transfer size of Flash has steadily droppedĀ from 81 kB on Nov 16 2010 down to 71 kB on this last run on May 16 2011 – a 12% decrease.
I used the new Compare Stats page to compare the interesting stats from Nov 15 2010 to May 16 2011. I again chose the “intersection” of URLs to get an apples-to-apples comparison. In addition to the transfer size increases mentioned previously, we can see the growth of various JavaScript libraries on these ~17K web pages over the last six months:
- jQuery is up from 39% to 44%
- Facebook widgets have grown from 8% to 13%
- Twitter widgets increased from 2% to 4%
The “Pages Using Google Libraries API” chart shows adoption of this CDN has grown from 10% to 13% since Nov 16 2010. This means even more cross-site caching benefits for users.
HTTP Archive Mobile
The HTTP Archive Mobile just launched last week. There’s only two weeks of historical data gathered on just the top 100 web pages, so there aren’t any major shifts in the trending charts just yet. Nevertheless, comparing the mobile stats between the last two runs, May 12 and May 16, has some interesting revelations.
The “Pages with the Most JavaScript” chart shows that the Twitpic page’s JavaScript grew from 308 kB to 374 kB – a 66 kB (21%) increase. This highlights the value of HTTP Archive (both desktop and mobile) to individual websites as a way to track performance stats and have a permanent history.
There are some other interesting stats at the individual website level, but we’ll need a few months of mobile data before we can draw conclusions about any trends. In the meantime, check back here around the 15th and 30th of each month to see the latest runs and discover new observations about how the web is running.
HTTP Archive Mobile
This morning during my talk at Mobilism I announced the HTTP Archive Mobile – a permanent repository for mobile performance data.
Ever since I announced the HTTP Archive (based on data gathered from Internet Explorer using WebPagetest) people have been asking, “What about data gathered from a mobile device?” It’s a logical next step. Thankfully, Blaze.io came on the scene a few months ago with a solution, Mobitest, for connecting to mobile devices and gathering this kind of information. Blaze.io has been doing great work in the area of mobile performance. I met Guy Podjarny, Blaze.io’s CTO, a few months ago and we’ve had several discussions about performance. A few weeks ago Guypo and I decided to start working on the HTTP Archive Mobile. It required a few changes on his side and a few on mine, but we quickly got it up and running and have been gathering data for the past week in anticipation of this announcement.
Here are some interesting comparisons of the Top 100 web pages from desktop vs mobile:
- Total bytes downloaded is 401 kB for desktop vs 271 kB for mobile.
- Total size of images, scripts, and stylesheets is smaller on mobile, but size of HTML is bigger. This is likely from JavaScript, CSS, and images being inlined in the HTML document – a good sign for performance.
- The New York Times is in the top 5 pages with the most JavaScript with 230 kB on desktop and 326 kB on mobile – 94 kB more JavaScript on mobile than desktop.
- The percentage of sites that have at least one redirect is 54% for desktop compared to 68% for mobile. Many websites (Facebook, Yahoo!, Bing, Taobao, etc.) redirect from www.* to m.* on mobile browsers. But it would be better to lower the mobile percentage since redirects inflict an even greater delay on mobile devices.
This is just the beginning. The HTTP Archive Mobile is currently in “Alpha” mainly because we’re gathering data for just the Alexa Global Top 100 sites and only have a week’s worth of data. More URLs and more data are on the way. Already there are valuable takeaways from what’s there, and these will increase as the number of websites and length of time grow. Take a look for yourself and add comments below on anything you find that’s interesting or puzzling. And thanks again to Guypo and Blaze.io for making the Mobitest framework available for collecting the data.
HTTP Archive: Top 100, Top 1000 & compare stats
Right now the HTTP Archive analyzes the world’s top 17,000 web pages gathering information about the site’s construction. It’s interesting data, especially for a performance junkie like me. Subsetting the data for comparisons is a challenge given the numerous ways this long list of URLs could be sliced.
This past week I added two new subsets: Top 100 and Top 1000. So now when you go to the Trends and Stats pages you have the following choices:
- All – The entire set of ~17K web pages but the exact URLs might vary from run to run due to errors, etc.
- intersection – The set of URLs for which data was gathered in every run. Right now there are ~15K that have data in every run so far. This is great for apples-to-apples comparisons.
- Top 100 – The world’s Top 100 based on Alexa.
- Top 1000 – The world’s Top 1000 based on Alexa.
There are some amazing differences between these sets of sites. To make it easier to explore these differences I added the Compare Stats page. You can pick two different runs and sets of URLs and see their stats charts side-by-side. I’m amazed at some of the differences between the Top 100 and Top 1000 for stats from the April 30 2011 run:
- total bytes downloaded: 401 kB vs 674 kB (Top 100 vs Top 1000)
- use of jQuery: 27% vs 43%
- use of Google Analytics: 25% vs 52%
- pages using Flash: 35% vs 49%
- resources with no caching headers: 26% vs 41%
Top 100 Bytes Downloaded
Top 1000 Bytes Downloaded
One takeaway is that stats that are critical for good performance (bytes downloaded, caching) worsen when we expand from the Top 100 to the Top 1000. What happens if we look further down the tail at “All”? I encourage you to check it out yourself and see how your site compares.
HTTP Archive: servers and most 404s
I launched the HTTP Archive about a month ago. The reaction has been positive including supportive tweets from Tim O’Reilly, Werner Vogels, Robert Scoble, and John Resig. I’m also excited about the number of people that have already started contributing to the project. Two new stats charts are available thanks to patches from open source contributors.
James Byers contributed the patch for generating the Most Common Servers pie chart. This chart is similar to BuiltWith’s Web Server chart. BuiltWith shows a higher presence of IIS than shown here. Keep in mind the sample sets are different – the HTTP Archive hits the world’s top ~17K URLs while BuiltWith is covering 1M URLs.
The other new chart comes from Carson McDonald. It shows pages with the most 404s. Definitely a list you don’t want to find your website on.
l’ve added some other features I’ll blog about tomorrow and am planning a bigger announcement later this week, so stay tuned for some more HTTP Archive updates.



