HTTP Archive switches to Chrome

May 23, 2016 9:08 am | 7 Comments

The HTTP Archive crawls the world’s top URLs twice each month and records detailed information like the number of HTTP requests, the most popular image formats, and the use of gzip compression. In addition to aggregate stats, the HTTP Archive has the same data for individual websites plus images and video of the site loading. It’s built on top of WebPageTest (yay Pat!), and all our code and data is open source. HTTP Archive is part of the Internet Archive and is made possible thanks to our sponsors: Google, Mozilla, New Relic, O’Reilly Media, Etsy, dynaTrace, Instart Logic, Catchpoint Systems, Fastly, SOASTA mPulse, and Hosting Facts.

I started the HTTP Archive in November 2010. Even though I worked at Google, I decided to use Internet Explorer 8 to gather the data because I wanted the data to represent the typical user experience and IE 8 was the world’s most popular browser. Later, testing switched to IE 9 when it became the most popular browser. Chrome’s popularity has been growing, so we started parallel testing with Chrome last year in anticipation of switching over. This month, it was determined that Chrome is the world’s most popular browser.

In May 2011, I launched HTTP Archive Mobile. This testing was done with real iPhones. It started by testing 1,000 URLs and has “scaled up” to 5,000 URLs. I put that in quotes because 5,000 URLs is far short of the 500,000 URLs being tested on desktop. Pat hosts these iPhones at home. We’ve found that maintaining real mobile devices for large scale testing is costly, unreliable, and time-consuming. For the last year we’ve talked about how to track mobile data in a way that would allow us to scale to 1M URLs. We decided emulating Android using Chrome’s mobile emulation features was the best option, and started parallel testing in this mode early last year.

Today, we’re announcing our switch from IE 9 and real iPhones to Chrome and emulated Android as the test agents for HTTP Archive.

We swapped in the new Chrome and emulated Android data starting March 1 2016. In other words, if you go to HTTP Archive the data starting from March 1 2016 is from Chrome, and everything prior is from Internet Explorer. Similarly, if you go to HTTP Archive Mobile the data starting from March 1 2016 is from emulated Android, and everything prior is from real iPhones. For purposes of comparison, we’re temporarily maintaining HTTP Archive IE and HTTP Archive iPhone where you can see the data from those test agents up to the current day. We’ll keep doing this testing through June.

This switchover opens the way for us to expand both our desktop and mobile testing to the top 1 million URLs worldwide. It also lowers our hardware and maintenance costs, and allows us to use the world’s most popular browser. Take a look today at our aggregate trends and see what stats we have for your website.

7 Responses to HTTP Archive switches to Chrome

  1. Is the archive set up to track how many servers offer Brotli-compressed content?

  2. Hi, Eric! I don’t think we look at Brotli specifically vs just “compressed”. Now that we’re using Chrome we can get more stats on things like Brotli and H2. It would be great if you filed a ticket ( https://github.com/HTTPArchive/httparchive/issues ) esp. if you could explain what we can look for to detect Brotli (and other) compression (eg, which header values).

  3. What connection profile is being used for the mobile tests?

  4. Joseph: We’re still using the 3G connection profile.

  5. Does the mobile emulation still use DummyNet for simulating 3G network characteristics, or are you now using the Chrome Network Conditions emulation?

  6. Great article. I’m a fan of Chrome and for me it’s been easier to crack the one million url benchmark w some of my blogs. Thanks for the content was a great read!

  7. One thing though – and I don’t mean to be nitpicky, but when you compare 4 percentage *points* (increase in requests with caching headers) to 12 *percent* (increase in number of requests per page) that’s rather misleading.
    The increase from 42% to 46% cached resources is an increase of 9.5 percent. Still a far cry from the 24% increase in total transfer size and certainly disappointing that it’s continuing to trail behind the reqs per page, but still.