HTTP Archive: max-age

April 18, 2011 9:40 pm | 10 Comments

There’s a long list of interesting stats to be added to the HTTP Archive. I’m planning on knocking those off at about one a week. (If someone wants to help that’d be great – contact me. Familiarity with MySQL and Google Charts API is a plus.)

Last week I added an interesting stat looking at the cache lifetime being specified for resources – specifically the value set in the Cache-Control: max-age response header. As a reminder, the HTTP Archive is currently analyzing the top ~17K websites worldwide. Across those websites a total of ~1.4M resources are requested. The chart below shows the distribution of max-age values across all those resources.

56% of the resources don’t have a max-age value and 3% have a zero or negative value. That means only 41% of resources are cacheable. In more concrete terms, the average number of resources downloaded per page is 81. 33 of those are cacheable, but the other 48 will likely generate an HTTP request on every page view. Ouch! That’s going to slow things down. Only 24% of resources are cacheable for more than a day. Adding caching headers is an obvious performance win that needs wider adoption.

10 Comments

HTTP Archive: URL list, Flash trends

April 5, 2011 10:45 pm | 5 Comments

Last week I announced the launch of the HTTP Archive. The feedback has been very positive. I’ve already heard from a handful of performance gurus who have downloaded the data and done additional analyses. This was a major goal of the project and I’m excited to see it happening.

I made a few changes to HTTP Archive that I wanted to share in this blog post.

First, there’s a potential for apples-to-oranges comparisons because the list of URLs “crawled” by HTTP Archive changes from run to run due to errors and changes in the “top N” sorting of sources like Alexa and Fortune 500. When comparing two runs it’s unclear if differences are caused from a change in the sample set or actual changes in Internet behavior. This was exemplified by this tweet from @orionlogic:

Website flash usage dropped %16 since 6 months http://t.co/XI7IWvc (via httparchive.org + @souders ) #noflash

The link contains two pie charts from HTTP Archive:

The issue is that the number of URLs grew from ~1000 in October 2010 to ~17,000 in April 2011. Those additional 16,000 websites have different behavior when it comes to using Flash. If we compare Nov 15 2010 to Mar 29 2011, both of which use ~17,000 URLs, the change is only 2%.

I made some changes to mitigate this issue.

  • The first three runs that were done with only 1000 URLs are now hidden in the UI. The data is still available.
  • Similar confusion can happen when viewing trending charts. The fix there is to use the “intersection” set of URLs across all runs. I added a note next to the “choose URLs” pick list to point out the benefit of choosing “intersection”. I moved the plot of “URLs Analyzed” to the top so it’s more apparent when the number of URLs changes from run to run.

On another note, Nicole Sullivan suggested I add a trending chart for transfer size and number of requests for Flash, in addition to the existing charts for HTML, JavaScript, CSS, and Images. The hypothesis was this might explain the increase in image sizes that I presented in my previous post – perhaps Flash was on the decline being replaced by HD images. The chart shows a slight decline in the size and number of Flash files, but not enough to explain the 61 kB increase in image transfer size.

I made several other fixes that are less visible. Several people have submitted requests for new stats. I’ll keep knocking those off and blogging about them.

5 Comments