Fewer requests through resource packages

November 18, 2009 12:31 am | 39 Comments

A few months back, I visited Alexander Limi at the Mozilla offices to discuss his proposal for resource packages. I liked the idea then and am excited that he has published the proposal in his article Making browsers faster: Resource Packages. I’ll start by reviewing the concept, provide my comments on some of the issues that have come up, and then provide some data on how this would impact the Alexa U.S. top 10 web sites in terms of HTTP requests.

Over the last few years I’ve talked with several people about the concept of bundling multiple resources into a single response, with the ultimate goal being to increase bandwidth utilization on the client. Typically, browsers only take advantage of about 30% of their bandwidth capacity when loading a web page because of the overhead of HTTP and TCP, and the various blocking behaviors in browsers. These previous resource bundling discussions got waylaid by implementation details and deployment challenges. The strength of Alexander’s proposal is its simplicity around these two tenets:

  1. the resource package is a zip file – Zip files have widespread support, are easy to create, require low CPU to zip and unzip, and are well understood by developers and users.
  2. the implementation is transparent to browsers that don’t support it – Developers can add resource packages to their existing pages with a LINK tag using the “resource-package” relationship. Browsers that don’t recognize “resource-package” LINKs (which is all browsers right now), ignore it and load the page the same way they do today. As browsers start to support “resource-package” LINKs, users start getting faster pages. Developers only have to create one version of their HTML.

These are simple principles, but they are why this proposal stands out.

How It Would Work

The first step is to identify which resources to bundle. There’s no simple recipe for doing this. It’s similar to figuring out which images to sprite, or which scripts to concatenate together. You have to look across your web pages and see if there is a core set of resources that are used over and over – rounded corners, logos, core JavaScript and CSS files, etc. You don’t want to bundle every image, script, and stylesheet across all your pages into one resource package. Similarly, you don’t want to create a separate resource package for every page – it’s wasteful to have the same resources repeated across multiple packages. Take the middle path.

The next step is to create a manifest.txt file that lists the resources, for example:

js/jquery.js
js/main.js
css/reset.css
css/grid.css
css/main.css
images/logo.gif
images/navbar.png

Then zip the manifest.txt and the files together.

Finally, add a line like this to your HTML document:

<link rel="resource-package" type="application/zip" href="pkg.zip" />

This causes the browser to download the zip file, extract the files, and cache them. As the browser continues parsing the HTML page, it encounters other resources, such as:

<script src="js/jquery.js" type="text/javascript"></script>

Browsers that don’t support resource packages would make a separate HTTP request for jquery.js. But browsers that support resource packages would already have jquery.js in cache, avoiding the need for any more HTTP requests. This is where the performance benefit is achieved.

Why? What about…?

To a large degree, bundling is possible today. Multiple scripts can be concatenated together. Similar story for stylesheets. Images can be sprited together. But even if you did all of this, you’d have three HTTP requests. Resource packages bundle all of these into one HTTP request. Also, although stylesheets are typically (hopefully) all in the document HEAD, you might have some scripts at the top of your HTML document and others at the bottom. You’d either have to bundle them all together at the top (and slow down your page for JavaScript that could have been loaded lower), or create two concatenated scripts. And even though I’ve created SpriteMe, creating sprites is still hard. Resource packages allow for files (scripts, stylesheets, images, etc.) to be developed and maintained separately, but more easily bundled for efficient download to the client.

Comments on various blogs and threads have raised concerns about resource packages hurting performance because they decrease parallel downloading. This should not be a concern. Worst case, developers could create two resource packages to get parallel downloading, but that won’t be necessary. I haven’t done the tests to prove it, but I believe downloading multiple resources across a single connection in streaming fashion will achieve greater bandwidth utilization than sharding resources across multiple domains for greater parallelization. Also, there will always be other resources that live outside the resource package. Jumping ahead a little bit, across the Alexa U.S. top 10 web sites, 37% of the resources in the page would not make sense to put into a resource package. So there will be plenty of opportunity for other downloads to happen in parallel.

Another advantage of zip files is that they can be unpacked as the bytes arrive. That’s where the manifest.txt comes in. By putting the manifest.txt first in the zip file, browsers can peel off the resources one-by-one and start acting on them immediately. But this means browsers are likely to block parsing the page until at least the manifest.txt has arrived (so that they know what to do when they encounter the next resource in the page). There are multiple reasons why this is not a concern.

  • Some of the non-packaged resources could be placed above the resource package – presumably browsers would start those downloads before blocking for the resource package’s manifest.txt.
  • The resource package’s LINK tag could (should) be included in the first part of the HTML document that is flushed early to the client, so the browser could start the request for the zip file even before the entire HTML document has arrived.
  • A final idea that isn’t in the proposal – I’d like to consider adding the manifest to the HTML document. This would add page weight, but if manifest blocking was an issue developers could use this workaround.
  • There should also be an ASYNC or DEFER attribute for resource packages that tells the browser not to wait. Again, only developers who felt this was an issue could choose this path.

My main issue with resource packages was giving up the granularity of setting caching headers for each resource individually. But as I thought about this, many web sites I look at version their resources today – giving them the same expiration dates. This was further confirmed during my examination of the top 10 sites (next section). If there are some resources that change frequently (less than weekly), they might be better in a separate resource package, or not packaged at all.

Survey of Resource Package Potential

I’m all about ideas that make a real impact. So I wanted to quantify how resource packages could be used in the real world. I examined the Alexa U.S. top 10 web sites and cataloged every resource that was downloaded. For each web site, I identified one or two domains that hosted the bulk of the resources. For the resources on those domains, I only included resources that had an expiration date at least one week in the future. Although you could package resources that weren’t cacheable, this would likely be a greater burden on the backend. It makes the most sense to package resources that are intended to be cached. The full details are available in this Google spreadsheet.

With this raw data in hand, I looked at the potential savings that could be achieved by using resource packages. On average, these websites have:

  • 42 resources (separate HTTP requests)
  • 26 of these would make sense to put into one or two resource packages

Using resource packages would drop the total number of HTTP requests on these sites from 42 to 17!

The results for each top 10 web site are shown in the following table. The last two columns show a size comparison. In the case of Amazon, for example, there were 73 HTTP requests for resources in the page. Of these, 47 were being downloaded from two main domains with a future expiration date, so could be put into a resource package. The download size of those 47 individual responses was 263K. After zipping them together, the size of the zip file was 232K.

original #
of requests
# of requests
packaged
individual file
size
zip file
size
Amazon 73 47 263K 232K
Bing 20 11 33K 30K
Blogger 43 34 105K 81K
eBay 48 38 179K 159K
Facebook 43 43 303K 260K
Google 11 8 29K 25K
MySpace 41 10 39K 35K
Wikipedia 61 21 73K 63K
Yahoo 48 45 294K 271K
YouTube 27 5 78K 71K
average 42 26 140K 123K

This is a best case analysis. It’s possible some of these resources might be page-specific or short-lived, and thus not good candidates for packaging. Also, the zip file size does not include the manifest.txt file size, although this would be small.

Resource packages promise to greatly reduce the number of HTTP requests in web pages, producing pages that load faster. Once we have browsers that support this feature, the impact on page load times can be measured. I’m excited to hear from Alexander that Firefox 3.7 will support resource packages. I hope other browsers will comment on this proposal and support this performance feature.

39 Responses to Fewer requests through resource packages

  1. So do any current browsers support this yet..?

    Apart from FF 3.7, any idea when it will be adopted by the masses (IE, Safari, ect) ..?

  2. So a site with a zip at 123K (average in your table), where they change 1 line of code in the css or 1 pixel in 1 image will make visitors have to redownload 123K, is that what you are saying?

  3. OK I thought some more about it, silly question. I guess it’s no difference from making someone have to redownload a sprite just because 1 icon changed or a minified js file because one line changed. I guess it all boils down to choosing whats best for the usecase, is the scripts and images updated frequently? then use sprites and minifies, else res packs.

  4. Well, the question is:
    As far as I understand, the Browser would attach to all included Ressources the HTTP Headers of the zip file.
    So the to-be-answered question would be:
    How should a Browser react, if the Resource Package it previously downloaded got changed, but 9 out of 10 included resources in the package are according to the HTML code still valid and fresh.
    Re-Download the whole package? Or download only the resource, that got changed? And where is the threshold?

  5. If browsers have problems with gzip content, I could see this being just as painful to support. Also why even go for zip, surely gzip would be easier as it’s already in the browser (technical quibble)

  6. Gzip files cannot contain multiple files. That’s why you get tar.gz, the tarball groups multiple files into one (but doesn’t compress them) and gzip does the compressing.

    Whatever packaging / compression system they use, it’s a great idea.

  7. What’s the advantage of this over a multipart document as mentioned here – http://developer.yahoo.com/performance/rules.html

  8. Interesting and good news if everyone starts working with it, i guess this would work with pages built on the fly as long as programmers split the pages down in cacheable parts and dynamic parts. Again, interesting

  9. I think this is a great concept and I look forward to being able to use it, however I too am a bit concerned about minor tweaks.

    If I zip up my JS, CSS, and Images into 1 file (so jQuery.js, UI.js, styles.css, print.css, siteChrome1..6.png, siteIcons1..30.png) on say November 15th, 2009

    Then I am all set… 40 HTTP requests down to 1!! – Wicked!

    But a few days later, I need to tweak my style.css file.

    If I re-zip up all the files, so that the zip file has a new datestamp of the 17th… as does the internal style.css of the 17th.

    Would the browser be smart enough to get the zip… determine that 39 of the internal files are unchanged, and only download and update the style.css file?

    My other question is about maintaining current behavior. Although the resources are bundled, I would hope that the browser would still block rendering when a script tag is encountered… and execute the script as it does today. If this became an async execution in parallel with rendering this would break a lot of sites.

    Thanks

  10. You’ve addressed my concerns concerning final filesize.

    But still this proposal raises more problems than it solves :

    – breaks browser cache
    – breaks http content negociation (and by consequence not future proof : user-agents won’t be able to download the best file/encoding… I think in particular of bz2 compression which is starting to slowly replace gunzip)

    All theses issues have already been adressed by http pipelining, which require 0 modifications of existing pages. It would made more sense to fix http pipelining support.
    Implementing a software layer who overlap with the network layer doesn’t seems to be the good thing to do.

  11. Thanks for all the comments. I’m glad that people are thinking about this and trying to find holes and improvements. That’s the goal. Great that Alexander put in the effort to get this out there.

    @Markus: If the resource package’s zip file is a different URL (or expired), the browser will re-download it. Resources in that zip file would be written to cache with the same HTTP response headers as the zip file, regardless of whether they were already in the cache. This is where thoughtfulness on the part of the developer is needed – for example, don’t put things in the resource package that change frequently. As Andreas mentions, a similar issue exists today for concatenating scripts and creating image sprites.

    @anon: Jake’s got the answer.

    @Andy Davies: Great question regarding multipart. The main issue is developers have to change their HTML when they switch to multipart. Because not all browsers support multipart, you have to have different HTML. If multipart is embedded in the HTML doc, it can’t be cached. If it’s included as a separate response (for caching), then the browser has to wait for that entire response before it can start acting. Resource packages don’t have any of these problems.

    @Steve: See my response to Markus. Adding more conventions to get partial downloads might be possible, but it starts requiring multiple requests or the ability for the browser to notify the server about cache state. I added a new column to the summary stats in the spreadsheet that shows the average length of time since the resource was Last-Modified is 196 days. Certainly some of these are shorter than that, and it only takes one to change to cause the entire zip to be recreated, but there’s plenty of room to define resource packages that have a reasonable length of validity (week or more). Scripts would be handled just as they are today regarding serial execution.

    @Ywg: This doesn’t break the cache. The zip file as well as the individual files are cached following the HTTP spec. Resources that need content negotiation could be excluded from the resource package. It’s important to keep in mind that not *every* resource in the page has to be or should be placed in a resource package. Take a look at Alexander’s original post for why pipelining, although it’s existed for years, is not supported by most servers and browsers.

  12. @souders — OK, I can agree that *maybe* devs will be smart (and unlazy) enough to create their resource packages in a smart enough way that other non-packaged resources will be able to download in parallel at the same time, and still make good use of the browser’s bandwidth. And also, yes it’s possible they’ll be smart enough not to force down more content before domready than is really necessary.

    But notice that these are some huge assumptions about everyone understanding *and* adopting possibly complex best-practices. It’s being awfully optimistic about the general UI dev community, isn’t it?

    In reality, the potential for this approach to lead to anti-patterns is a big concern for me. Rather than bog down this comment thread, I’ve put up this blog post detailing some of my thoughts/responses:

    Resource Packaging? Silver Bullet?

  13. @Steve:
    Yeah, probably the best idea is, to KISS, Keeping it simple, stupid. So you would just put resources into the package, that has a Last-Modfied of at least > 30 days. I think I have read in one of your books the fact, that even if still fresh, most content got flushed from the Browser Cache after 30 days anyway due to the limited size of the Browser Cache. So a Full GET would not make much of a difference anyway.

  14. You mention that bundling is available today, but don’t mention to what degree the bundling available today would help the sites you examined. My suspicion is that you can achieve nearly the same gains without resource bundles using existing techniques.

    You say that you suspect decreased parallelism is not a big issue. But in our testing when tweaking Chromium’s network layer we found that there were enormous effects when changing parallelism, which varied due to things like packet loss, the overall connection speed, latency to different resource servers, etc.– not the kind of situation where I’d be willing to wave my hands and say “this won’t matter”.

    Putting a manifest first in a bundle seems important to prevent this technique from backfiring, but just zipping up a bunch of files doesn’t guarantee the manifest will come first.

    Overall, as I hinted on the webkit-dev thread about resource bundles, I see this as providing little advantage over current techniques, but being riddled with ways to hurt yourself as an author.

    I would very much like to see some of the websites you’ve analyzed tested in a matrix. On one axis, we have the site markup style: as-is, using existing bundling techniques “to best effect”, and using resource bundles “to best effect”. On the other axis, different connection bandwidth/latency/packet loss combinations. Then we check actual page load time. This would give some idea of the maximal performance gains for resource bundles over existing techniques.

  15. Have a smarter web server and HTTP. If the HTTP server had a model of the whole document, not just the individual files (resources) but what contains or refers to what, then all resources could be returned in one HTTP request. E.g. “GET /*” returns not just index.html, but index.html and all resources it includes or links to in one (compressed) package. The client could also include some information about what resources it already has cached, and the server could only return those that it is missing or which have changed. Anyone know if they are considering anything like this for a future version of HTTP?

  16. @fool — i think that makes perfect sense. i endorse that as an idea that we should explore. Hmmm, I wonder how we get the browsers to understand all the files coming back in one request (not just multi-part, but just in a stream) as a result of one GET. That way all the people complaining (and rightfully so) about additional request overhead could, at least significantly, be quieted. :)

  17. Are the sizes in the ‘before’ column taking into account gzipping or not? I would imagine that most of the text/html and text/css resources are already being served gzip’d up.

  18. “This doesn’t break the cache. The zip file as well as the individual files are cached following the HTTP spec.”

    They may be treaten individually, but they all share the same caching directive than their package. Wouldn’t it be more effective to put the caching header in the manifest ?

    For content-negociation, if you must keep thoses ressources out of the package, that automaticaly exclude text based files (js, css, svg…). The best use would be to package images and concatenate css/js so they can beneficiate gz/bz2/to-be-invented-uber-compression-method.

  19. @Peter: Even with resource packages, there’s still parallel downloading. It’s not an either-or situation. That’s why I’m not concerned – because we’ll still have that opportunity. The manifest file will be first, as required in the proposed spec. More detailed testing would be great – but we need the browser support first (and a latency lab, and a huge amount of time).

    @James: The “individual file size” column is the sum of the # of bytes received. In most cases the .js and .css was gzipped. This also includes headers. That’s the main cause for the zip file being smaller – only one set of headers. As an example, the average size of headers (request plus response) for Bing is 300 bytes per request.

  20. I see this as just another useful addition to a webdev’s tool box – especially when other browser start to adopt this technique. I’m also quite certain, that it’s nice for website features like themes on certain pages…there may be many themes (consisting of CSS and images etc.), but probably they won’t change all the time (and simple resource versioning/revving helps in those cases).

  21. this sounds like an interesting idea. I don’t see anything wrong with it. It’s ironic though that the faster web gets, the more we need to eonomize.

  22. The problem of a single zip file is that we won’t have a progressive rendering of the page but the users will see a naked html page for a little of time, and that time could be many seconds on slower PC and connections.
    But this could be solved: CSS files should be zipped separately from images and JS.
    BTW I like that idea, the growth of HTTP request is a problem with many sites.

  23. This is pretty much an very old, but good, idea.

    Netscape playd with this idea already in 1998:
    http://docs.sun.com/source/816-6164-10/signscpt.htm

  24. Have you considered adding an optional hash to the link element?

    It just seems to me that with such a hash, a supporting browser could choose to reuse identical resource packages downloaded and cached from other sites, eg. a specific zip of jQuery, TinyMCE, and so forth. That way, it could even eliminate HEAD hits on the resource package a lot of the time.

    Extending the further, it could also ask peers on the LAN for the same hash; ask a friendly CDN or cloud; or even some extravagant BitTorrent’esque P2P net.

    Of course, the hash would have to be very strong.

  25. @Steve What is the overhead in unzipping on the client, lets say if it is a huge page and a slow client? I guess that it shouldn’t be a problem but I thought I’d ask anyway.

  26. @Carolo: Progressive rendering still happens. The browser doesn’t wait for the entire zip file to download. It downloads the initial manifest.txt file (small) and then proceeds to parse and render the rest of the page. When it encounters and external resource (script, stylesheet, image, etc.) that is listed in the manifest, it gets that from the zip file. Files are extracted from the zip file as they arrive – again, the browser doesn’t have to wait for the entire zip file to download to extract files. So, if there’s an image at the top of the page and that image is placed early in the zip file, the page would progressively render event faster (since the request for the zip file starts near the top of the page).

    @Tom: I’ve heard other people talk about the hash idea. It’s orthogonal to resource packages. Even without resource packages, the hash idea would be beneficial (jQuery across different sites, as you mention).

    @Per: I haven’t seen hard numbers, but assume this is tiny compared to the time saved.

  27. I’m sure site developers, webmasters, etc… would love this feature and really ‘work with it’ as it could free up from CPU time (possibly bandwidth too) from there servers.

    I guess as a lot of people pointed out, if a picture changed, or a line of code altered, then you have to get the whole RP again… seems like the idea is great for sites that should split things into static content and dynamic content to avoid this pitfall.

  28. @steve

    Not sure I understand “how files will be extracted from the zip as they arrive”

    In the zip format the directory is the last thing in the the archive, so would the proposal rely on ignoring this and just picking up the header for each entry instead?

  29. @Andy: That’s what the manifest.txt is for.

  30. @Steve (et al): how do we guarantee that the manifest.txt is the first file in the .zip file?

  31. List it first? I know the spec says the order of the directory might not be the order in the zip. Have you seen that the file listed first isn’t included first? Does that vary by tool? I don’t have a tool to examine the streaming bytes of a zip file. How do you verify this?

  32. I would love the day I can use something like this on my work.

  33. I am not an expert on ZIP file, but PKZIP specification contains the following:

    V. General Format of a .ZIP file
    Files stored in arbitrary order.

    There are probably ZIP compressor clients that allow you to specify which file comes first in the library, but doesn’t this make the whole idea a lot more complicated. Wouldn’t it be better to just include this information in the HTML itself.

    Adding this manifest.txt as a separate file doesn’t seem smart, since it would then introduce yet another HTTP request.

  34. I think the manifest.txt info should be inlined in the response that the HTTP request gives, as a set of X-Headers, for instance.

  35. @Berry: The manifest.txt is the first file in the zip – it’s not a separate HTTP request.

    @Kyle: Most of the discussion has been around inlining the manifest.txt in HTML. Doing it as header would work, too.

  36. As fool said, that could be handled at the HTTP level.

    HTTP can already gzip-encode any document at the transport level. Besides some brokenness in CSS and JS decompression that’s mostly history in IE (IE6 is history, isn’t it ?), we take for granted that we can deploy (or dynamically generate) any plain text file on a web server and let the HTTP server gzip it. I can’t see the CPU and memory impact of gzipping on my servers today, it’s a tiny fraction of any work of a frontend. And it’s a great and automatic benefit for everybody and strategic types of document.

    HTTP could also solve the problem of fetching multiple documents at once. The whole idea is to spare round time trips, isn’t it ? We could preserve all HTTP semantics (one URI per document, caching, headers, etc) if we could send a HTTP GET which bundles many URIs and expects for instance a MIME encapsulated response with headers+body for every requested document.

    I don’t think that could be handled by globs like ‘GET /*.js’, this would be to complicated to specify and implement (the browser would not know distinct ressource names in advance for a start). And obviously that would have to be somewhat URL-encoded to reuse all the infrastructure we have. Maybe is not that crazy.

    The bundle.zip has the advantage of letting the developper explicitly choose and name what’s bundled together. With a ‘multiple document’ GET request, we would have to let the browser decide how to bundle requests (alhtough it might be smarter than a webdev on average :)).

    It also only leverages client-side support and no protocol changes. Obviously this is way easier to implement and foster.

    The bundle.zip has the disavdantage of providing poor or no metadata: zip will store the mdate and that’s pretty it, nothing like HTTP headers. That could be painful: for bundles, the browser will map an extension to a MIME type, which will not be necessarily the same mapping as the one picked by the server. Going back and forth between single and bundled ressources will not be always easy and transparent.

    Support for “Get multiple” would require basically the same basecode as MHTML on the client side I guess. On the server side this is a kind of ‘HTTP pipelining’ but without the async nightmare of it: there’s one session for many documents, but first client talks, then server answers, and done. The server side can be prototyped easily with a simple CGI.

  37. The blog swalled my tag, let’s use another code: “maybe |link rel=”bundle” type=”multipart/mixed” href=”/site.js /tracker.js /foo.js”| is not that crazy”.

  38. I like this idea very much. However, I think it should become a part of the HTTP spec rather than be implemented via HTML tags.

    This would guarantee adoption of this proposal WITHOUT needing anyone to change any of their pages. As soon as the HTTP server is upgraded all customers with upgraded browsers will see the performance benefits.

    Web servers would become responsible to generate another HTTP header (similar to content negotiation) which defines the resources that the current page depends upon. This would also guarantee that all dependencies are resolved BEFORE the page is loaded.

    Example HTTP header:

    Accept-package: type=application/zip; href=/pkg.zip;

    This approach allows for flexibility in compression types so, for example we could also have:

    Accept-package: type=application/gzip; href=/pkg.gz;

    Additionally, this scheme would also allow the manifest to be fetched inline. So we don’t have to worry about whether the manifest is first in the zip file or not:

    Accept-package: type=application/zip; href=/pkg.zip; manifest=js/jquery.js; js/main.js; css/reset.css; css/grid.css; css/main.css; images/logo.gif; images/navbar.png

    Or, if that’s too messy the manifest could be fetched in another href:

    Accept-package: type=application/zip; href=/pkg.zip; manifest-href=/manifest.txt

    but that would require an extra HTTP request!

    Thoughts anyone?

    Shiraz

  39. What would it take to create a client-side script that would look in the cache, see which files it needs, send that to the server, and have a server-side script compile anything new into a zip on-the-fly to send to the client?

    Would zipping a file take too much time? Is it possible to zip-compress each file beforehand, then use scripts to combine the file pieces together with some zip headers and footers?
    (Yeah, I don’t know exactly how zipping works.)

    The idea of downloading an entire zip file over again when something changes is terrible. That has to have a work-around.