Add your site & custom fonts

November 17, 2011 7:45 pm | 2 Comments

The Nov 15 2011 crawls for the HTTP Archive and HTTP Archive Mobile are done. Two new things were added.

Add your site

Our goal is to crawl the world’s top 1,000,000 URLs. This month we doubled the number of URLs from 17K to 35K. We’re still a ways away but making progress. But what if you’d like your website to be in the HTTP Archive but it isn’t in the top 1M?

Now you can add your site to the HTTP Archive. If it’s already in the list we’ll tell you and point you to any data that’s been gathered so far. If it’s not in the list we’ll queue it up for the next crawl. We moderate all additions to make sure the URL is valid. We also have a limit of 1 URL per website. We strive to crawl a site’s main URL (e.g., but not all the subpages within a site (,, etc.).

Custom Fonts

I’ve been thinking more about custom fonts after Typekit‘s acquisition by Adobe and seeing Jeff Veen at Velocity Europe. (Make sure to watch the video of Jeff’s talk – it’s an amazing presentation with a humorous start.) So this week I added a chart to track the adoption of custom fonts:

Typekit is clearly on to something – the use of custom fonts has tripled in one year. I warn against using @font-face for performance reasons, but performance isn’t all that matters. (Gasp!) Custom fonts obviously have aesthetic benefits that are attractive to website owners.

Fortunately, Typekit has several performance optimizations in how they load fonts. They combine all the fonts in a single stylesheet for browsers that support data: URIs. The fonts are served over a CDN. The font’s are only cacheable for 5 minutes which hurts repeat visits, but I believe they’re working on longer cache times.

For truly fast and robust font loading we need to lean on browser developers to implement better caching for fonts and better timeout choices during rendering. I’ll be talking about this during my High Performance HTML5 session at QCon on Friday.

2 Responses to Add your site & custom fonts

  1. It’s great we could add websites to HTTP Archive.

    I have add an website buyonme-dot-com but it showed: “the site is already in the list of URLs.”

    When I check the list in, I can not find buyonme. Maybe the data is not updated yet?

    Thanks for Steve’s effort and all the things you have done for the Internet.

  2. @yanyun: It is a little confusing. Your example will help clarify the situation. There are 1,000,000 URLs in the list, but right now data is only being gathered for the top 50,000. is ranked 101,213, so it IS in the list but we’re not yet gathering data for it. However, now that you’ve requested it to be added it automatically is included in all future runs, regardless of rank.

    Runs happen on the 1st and 15th of each month.