HTTP Archive: servers and most 404s

May 9, 2011 5:11 am | 5 Comments

I launched the HTTP Archive about a month ago. The reaction has been positive including supportive tweets from Tim O’Reilly, Werner Vogels, Robert Scoble, and John Resig. I’m also excited about the number of people that have already started contributing to the project. Two new stats charts are available thanks to patches from open source contributors.

James Byers contributed the patch for generating the Most Common Servers pie chart. This chart is similar to BuiltWith’s Web Server chart. BuiltWith shows a higher presence of IIS than shown here. Keep in mind the sample sets are different – the HTTP Archive hits the world’s top ~17K URLs while BuiltWith is covering 1M URLs.

The other new chart comes from Carson McDonald. It shows pages with the most 404s. Definitely a list you don’t want to find your website on.

l’ve added some other features I’ll blog about tomorrow and am planning a bigger announcement later this week, so stay tuned for some more HTTP Archive updates.

5 Responses to HTTP Archive: servers and most 404s

  1. The live servers chart shows GSE at around 20%. Which web server is that?

  2. @Frank: That link was wrong – it pointed to some test data. It’s fixed now and GSE isn’t there.

  3. Pages with redirects (3xx) = 62% ?

    That seems odd.

  4. @Danil: That’s the percentage of pages that have at least one 301 or 302 response – not just for the HTML document but for any resource in the page.

  5. Great addition, it would also be interesting to compare load times by sever type, although there is a great difference in the purpose of the different web servers.

    I bet a great percentage of the content is being served by Squid or the like proxies. This info can be normally extracted from the “Via” header, and probably should have to get into the comparison. Will try to dig on the project’s code…

    Apart from that, HTTP Archive is definitely a great service. I am writing to rest of the devs and management, that our actions and decisions are being [permanently] recorded, and that it is not only available to customers, but to competitors also.

    Thanks for the great work,