HTTP Archive: 2011 recap
I started the HTTP Archive back in October 2010. It’s hard to believe it’s been that long. The project is going well:
- The number of websites archived has grown from ~15K to ~55K. (Our goal for this year is 1M!)
- In May we partnered with Blaze.io to launch the HTTP Archive Mobile.
- In June we merged with the Internet Archive.
- Joining the Internet Archive allowed us to accept financial support from our incredible sponsors: Google, Mozilla, New Relic, O’Reilly Media, Etsy, Strangeloop, and dynaTrace Software. Last month Torbit became our newest sponsor.
- As of last week we’ve completely moved to our new data center, ISC.
I’m pleased with how the WPO community has contributed to make the HTTP Archive possible. The project wouldn’t have been possible without Pat Meenan and his ever impressive and growing WebPagetest framework. A number of people have contributed to the open source code including Jonathan Klein, Yusuke Tsutsumi, Carson McDonald, James Byers, Ido Green, Mike Pfirrmann, Guy Leech, and Stephen Hay.
This is our first complete calendar year archiving website statistics. I want to start a tradition of doing an annual recap of insights from the HTTP Archive.
2011 vs 2012
The most noticeable trend during 2011 was the size of websites and resources. Table 1 shows the transfer size of content types for the average website. For example, “379kB” is the total size of images downloaded for an average website. (Since the sample of websites changed during the year, these stats are based on the intersection trends for 11,910 websites that were in every batch run.)
|Table 1. Transfer Size by Content Type|
|Jan 2011||Jan 2012||change|
One takeaway from this data is that images make up a majority of the bytes downloaded for websites (59%). Also, images are the second fastest growing content type for desktop and the #1 fastest growing content type for mobile. These two observations highlight the need for more performance optimizations for images. Many websites would benefit from losslessly compressing their images with existing tools. WebP is another candidate for reducing image size.
On a positive note, the amount of Flash being downloaded dropped 10%. Sadly, the number of sites using Flash only dropped from 44% to 43%, but at least those swfs are downloading faster.
Adoption of Best Practices
I personally love the HTTP Archive for tracking the adoption of web performance best practices. Some trends year-over-year include:
- The percent of resources that had caching headers grew from 42% to 46%. It’s great that the use of caching is increasing, but the fact that 54% of requests still don’t have any caching headers is a missed opportunity.
- Sites using the Google Libraries API jumped from 10% to 16%. Using a CDN with distributed locations and the ability to leverage caching across websites make this a positive for web performance.
- On the downside, websites with at least one redirect grew from 59% to 66%.
- Websites using custom fonts quadrupled from 2% to 8%. I’ve written about the performance dangers of custom fonts. Just today I did a performance analysis of Maui Rippers and discovered the reason the site didn’t render for 6+ seconds was a 280K font file.
It’s compelling to see how best practices are adopted by the top websites as compared to more mainstream websites. Table 2 shows various stats for the top 100 and top 1000 websites, as well as all 53,614 websites in the last batch run.
|Table 2. Best Practices for Top 100, Top 1000, All|
|Top 100||Top 1000||All|
The overall trend shows that performance best practices drop dramatically outside of the Top 100 websites. The most significant are:
- Total size goes from 509 kB to 805 kB to 962 kB.
- Total number of HTTP requests is similar growing from 57 to 90 and a small decrease to 86 requests.
- The use of future caching headers is high for the top 100 at 70%, but then drops to 58% and even further to 42%.
The Web has a long tail. It’s not enough for the top sites to have high performance. WPO best practices need to find their way to the next tier of websites and on to the brick-and-mortar, mom-and-pop, and niche sites that we all visit. More awareness, more tools, and more automation are the answer. I can’t wait to read the January 2013 update to this blog post and see how we did. Here’s to a faster and stronger Web in 2012!