Fewer requests through resource packages

A few months back, I visited Alexander Limi at the Mozilla offices to discuss his proposal for resource packages. I liked the idea then and am excited that he has published the proposal in his article Making browsers faster: Resource Packages. I’ll start by reviewing the concept, provide my comments on some of the issues that have come up, and then provide some data on how this would impact the Alexa U.S. top 10 web sites in terms of HTTP requests.

Over the last few years I’ve talked with several people about the concept of bundling multiple resources into a single response, with the ultimate goal being to increase bandwidth utilization on the client. Typically, browsers only take advantage of about 30% of their bandwidth capacity when loading a web page because of the overhead of HTTP and TCP, and the various blocking behaviors in browsers. These previous resource bundling discussions got waylaid by implementation details and deployment challenges. The strength of Alexander’s proposal is its simplicity around these two tenets:

  1. the resource package is a zip file – Zip files have widespread support, are easy to create, require low CPU to zip and unzip, and are well understood by developers and users.
  2. the implementation is transparent to browsers that don’t support it - Developers can add resource packages to their existing pages with a LINK tag using the “resource-package” relationship. Browsers that don’t recognize “resource-package” LINKs (which is all browsers right now), ignore it and load the page the same way they do today. As browsers start to support “resource-package” LINKs, users start getting faster pages. Developers only have to create one version of their HTML.

These are simple principles, but they are why this proposal stands out.

How It Would Work

The first step is to identify which resources to bundle. There’s no simple recipe for doing this. It’s similar to figuring out which images to sprite, or which scripts to concatenate together. You have to look across your web pages and see if there is a core set of resources that are used over and over – rounded corners, logos, core JavaScript and CSS files, etc. You don’t want to bundle every image, script, and stylesheet across all your pages into one resource package. Similarly, you don’t want to create a separate resource package for every page – it’s wasteful to have the same resources repeated across multiple packages. Take the middle path.

The next step is to create a manifest.txt file that lists the resources, for example:

js/jquery.js
js/main.js
css/reset.css
css/grid.css
css/main.css
images/logo.gif
images/navbar.png

Then zip the manifest.txt and the files together.

Finally, add a line like this to your HTML document:

<link rel="resource-package" type="application/zip" href="pkg.zip" />

This causes the browser to download the zip file, extract the files, and cache them. As the browser continues parsing the HTML page, it encounters other resources, such as:

<script src="js/jquery.js" type="text/javascript"></script>

Browsers that don’t support resource packages would make a separate HTTP request for jquery.js. But browsers that support resource packages would already have jquery.js in cache, avoiding the need for any more HTTP requests. This is where the performance benefit is achieved.

Why? What about…?

To a large degree, bundling is possible today. Multiple scripts can be concatenated together. Similar story for stylesheets. Images can be sprited together. But even if you did all of this, you’d have three HTTP requests. Resource packages bundle all of these into one HTTP request. Also, although stylesheets are typically (hopefully) all in the document HEAD, you might have some scripts at the top of your HTML document and others at the bottom. You’d either have to bundle them all together at the top (and slow down your page for JavaScript that could have been loaded lower), or create two concatenated scripts. And even though I’ve created SpriteMe, creating sprites is still hard. Resource packages allow for files (scripts, stylesheets, images, etc.) to be developed and maintained separately, but more easily bundled for efficient download to the client.

Comments on various blogs and threads have raised concerns about resource packages hurting performance because they decrease parallel downloading. This should not be a concern. Worst case, developers could create two resource packages to get parallel downloading, but that won’t be necessary. I haven’t done the tests to prove it, but I believe downloading multiple resources across a single connection in streaming fashion will achieve greater bandwidth utilization than sharding resources across multiple domains for greater parallelization. Also, there will always be other resources that live outside the resource package. Jumping ahead a little bit, across the Alexa U.S. top 10 web sites, 37% of the resources in the page would not make sense to put into a resource package. So there will be plenty of opportunity for other downloads to happen in parallel.

Another advantage of zip files is that they can be unpacked as the bytes arrive. That’s where the manifest.txt comes in. By putting the manifest.txt first in the zip file, browsers can peel off the resources one-by-one and start acting on them immediately. But this means browsers are likely to block parsing the page until at least the manifest.txt has arrived (so that they know what to do when they encounter the next resource in the page). There are multiple reasons why this is not a concern.

  • Some of the non-packaged resources could be placed above the resource package – presumably browsers would start those downloads before blocking for the resource package’s manifest.txt.
  • The resource package’s LINK tag could (should) be included in the first part of the HTML document that is flushed early to the client, so the browser could start the request for the zip file even before the entire HTML document has arrived.
  • A final idea that isn’t in the proposal – I’d like to consider adding the manifest to the HTML document. This would add page weight, but if manifest blocking was an issue developers could use this workaround.
  • There should also be an ASYNC or DEFER attribute for resource packages that tells the browser not to wait. Again, only developers who felt this was an issue could choose this path.

My main issue with resource packages was giving up the granularity of setting caching headers for each resource individually. But as I thought about this, many web sites I look at version their resources today – giving them the same expiration dates. This was further confirmed during my examination of the top 10 sites (next section). If there are some resources that change frequently (less than weekly), they might be better in a separate resource package, or not packaged at all.

Survey of Resource Package Potential

I’m all about ideas that make a real impact. So I wanted to quantify how resource packages could be used in the real world. I examined the Alexa U.S. top 10 web sites and cataloged every resource that was downloaded. For each web site, I identified one or two domains that hosted the bulk of the resources. For the resources on those domains, I only included resources that had an expiration date at least one week in the future. Although you could package resources that weren’t cacheable, this would likely be a greater burden on the backend. It makes the most sense to package resources that are intended to be cached. The full details are available in this Google spreadsheet.

With this raw data in hand, I looked at the potential savings that could be achieved by using resource packages. On average, these websites have:

  • 42 resources (separate HTTP requests)
  • 26 of these would make sense to put into one or two resource packages

Using resource packages would drop the total number of HTTP requests on these sites from 42 to 17!

The results for each top 10 web site are shown in the following table. The last two columns show a size comparison. In the case of Amazon, for example, there were 73 HTTP requests for resources in the page. Of these, 47 were being downloaded from two main domains with a future expiration date, so could be put into a resource package. The download size of those 47 individual responses was 263K. After zipping them together, the size of the zip file was 232K.

original #
of requests
# of requests
packaged
individual file
size
zip file
size
Amazon 73 47 263K 232K
Bing 20 11 33K 30K
Blogger 43 34 105K 81K
eBay 48 38 179K 159K
Facebook 43 43 303K 260K
Google 11 8 29K 25K
MySpace 41 10 39K 35K
Wikipedia 61 21 73K 63K
Yahoo 48 45 294K 271K
YouTube 27 5 78K 71K
average 42 26 140K 123K

This is a best case analysis. It’s possible some of these resources might be page-specific or short-lived, and thus not good candidates for packaging. Also, the zip file size does not include the manifest.txt file size, although this would be small.

Resource packages promise to greatly reduce the number of HTTP requests in web pages, producing pages that load faster. Once we have browsers that support this feature, the impact on page load times can be measured. I’m excited to hear from Alexander that Firefox 3.7 will support resource packages. I hope other browsers will comment on this proposal and support this performance feature.