HTTP Archive: Top 100, Top 1000 & compare stats

May 12, 2011 3:29 am | 2 Comments

Right now the HTTP Archive analyzes the world’s top 17,000 web pages gathering information about the site’s construction. It’s interesting data, especially for a performance junkie like me. Subsetting the data for comparisons is a challenge given the numerous ways this long list of URLs could be sliced.

This past week I added two new subsets: Top 100 and Top 1000. So now when you go to the Trends and Stats pages you have the following choices:

  • All – The entire set of ~17K web pages but the exact URLs might vary from run to run due to errors, etc.
  • intersection – The set of URLs for which data was gathered in every run. Right now there are ~15K that have data in every run so far. This is great for apples-to-apples comparisons.
  • Top 100 – The world’s Top 100 based on Alexa.
  • Top 1000 – The world’s Top 1000 based on Alexa.

There are some amazing differences between these sets of sites. To make it easier to explore these differences I added the Compare Stats page. You can pick two different runs and sets of URLs and see their stats charts side-by-side. I’m amazed at some of the differences between the Top 100 and Top 1000 for stats from the April 30 2011 run:

  • total bytes downloaded: 401 kB vs 674 kB (Top 100 vs Top 1000)
  • use of jQuery: 27% vs 43%
  • use of Google Analytics: 25% vs 52%
  • pages using Flash: 35% vs 49%
  • resources with no caching headers: 26% vs 41%

Top 100 Bytes Downloaded

Top 1000 Bytes Downloaded

One takeaway is that stats that are critical for good performance (bytes downloaded, caching) worsen when we expand from the Top 100 to the Top 1000. What happens if we look further down the tail at “All”? I encourage you to check it out yourself and see how your site compares.

2 Responses to HTTP Archive: Top 100, Top 1000 & compare stats

  1. Ive noticed there is a stronger correlation of page load time to size transferred than connections made. How does this affect your stance in High Performance Websites to connections being the highest contributing factor to page load time?

  2. @Anthony: They’re both important. Make sure to checkout the stats correlating various signals to load time and render time. Feel free to post another comment with a link to your data analysis.