Self-updating scripts

May 22, 2012 11:53 am | 33 Comments

Updates: Philip Tellis deployed this as part of Log-Normal’s snippet and found two problems. One is fixed and the other is still being investigated:

  • The previous code had a race condition when beacon.js called document.body.appendChild before BODY existed. This was fixed.
  • Some users reported the update.php iframe was opened in a new tab in IE8. I can’t repro this bug but am investigating.

Analyzing your site using Page Speed or YSlow often produces lower scores than you might expect due to 3rd party resources with short cache times. 3rd party snippet owners use short cache times so that users receive updates in a timely fashion, even if this means slowing down the site owner’s page.

Stoyan and I were discussing this and wondered if there was a way to have longer cache times and update resources when necessary. We came up with a solution. It’s simple and reliable. Adopting this pattern will reduce unnecessary HTTP requests resulting in faster pages and happier users, as well as better Page Speed and YSlow scores.

Long cache & revving URLs

Caching is an important best practice for making websites load faster. (If you’re already familiar with caching and 304s you may want to skip to the self-updating section.) Caching is easily achieved by giving resources an expiration date far in the future using the Cache-Control response header. For example, this tells the browser that the response can be cached for 1 year:

Cache-Control: max-age=31536000

But what happens if you make changes to the resource before the year is over? Users who have the old version in their cache won’t get the new version until the resource expires, meaning it would take 1 year to update all users. The simple answer is for the developer to change the resource’s URL. Often this is done by adding a “fingerprint” to the path, such as the source control version number, file timestamp, or checksum. Here’s an example for a script from Facebook:

http://static.ak.fbcdn.net/rsrc.php/v1/yx/r/N-kcJF3mlg6.js

It’s likely that if you compare the resource URLs for major websites over time you’ll see these fingerprints changing with each release. Using the HTTP Archive we see how the URL changes for Facebook’s main script:

  • http://static.ak.fbcdn.net/rsrc.php/v1/y2/r/UVaDehc7DST.js (March 1)
  • http://static.ak.fbcdn.net/rsrc.php/v1/y-/r/Oet3o2R_9MQ.js (March 15)
  • http://static.ak.fbcdn.net/rsrc.php/v1/yS/r/B-e2tX_mUXZ.js (April 1)
  • http://static.ak.fbcdn.net/rsrc.php/v1/yx/r/N-kcJF3mlg6.js (April 15)

Facebook sets a 1 year cache time for this script, so when they make changes they rev the URL to make sure all users get the new version immediately. Setting long cache times and revving the URL is a common solution for websites focused on performance. Unfortunately, this isn’t possible when it comes to 3rd party snippets.

Snippets don’t rev

Revving a resource’s URL is an easy solution for getting updates to the user when it comes to the website’s own resources. The website owner knows when there’s an update and since they own the web page they can change the resource URL.

3rd party snippets are a different story. In most cases, 3rd party snippets contain the URL for a bootstrap script. For example, here’s the Tweet Button snippet:

<a href="https://twitter.com/share" class="twitter-share-button"
data-lang="en">Tweet</a>
<script>
!function(d,s,id){
    var js,fjs=d.getElementsByTagName(s)[0];
    if(!d.getElementById(id)){
        js=d.createElement(s); js.id=id;
        js.src="//platform.twitter.com/widgets.js";
        fjs.parentNode.insertBefore(js,fjs);
}}(document,"script","twitter-wjs");
</script>

Website owners paste this snippet code into their pages. In the event of an emergency update, the Twitter team can’t rev the widgets.js URL because they don’t have access to change all the web pages containing this snippet. Notifying all the website owners to update the snippet isn’t an option, either. Since there’s no way to rev the URL, bootstrap scripts typically have a short cache time to ensure users get updates quickly. Twitter’s widgets.js is cacheable for 30 minutes, Facebook’s all.js is cacheable for 15 minutes, and Google Analytics’ ga.js is cacheable for 2 hours. This is much shorter than the recommended practice of setting the expiration date a month or more in the future.

Conditional GETs hurt performance

Unfortunately, these short cache times for bootstrap scripts have a negative impact on web performance. When the snippet’s resource is requested after the cache time has expired, instead of reading the resource from cache the browser has to issue a Conditional GET request (containing the If-Modified-Since and If-None-Match request headers). Even if the response is a simple 304 Not Modified with no response body, the time it takes to complete that roundtrip impacts the user experience. That impact varies depending on whether the bootstrap script is loaded in the normal way vs. asynchronously.

Loading scripts the “normal way” means using HTML: <script src="..."></script>. Scripts loaded this way have several negative impacts: they block all subsequent DOM elements from rendering, and in older browsers they block subsequent resources from being downloaded. These negative impacts also happen when the browser makes a Conditional GET request for a bootstrap script with a short cache time.

If the snippet is an async script, as is the case for widgets.js, the negative impact is reduced. In this case the main drawback impacts the widget itself – it isn’t rendered until the response to the Conditional GET is received. This is disconcerting to users because they see these async widgets popping up in the page after the surrounding content has already rendered.

Increasing the bootstrap script’s cache time reduces the number of Conditional GET requests which in turn avoids these negative impacts on the user experience. But how can we increase the cache time and still get updates to the user in a timely fashion without the ability to rev the URL?

Self-updating bootstrap scripts

A bootstrap script is defined as a 3rd party script with a hardwired URL that can’t be changed. We want to give these scripts long cache times so they don’t slow down the page, but we also want the cached version to get updated when there’s a change. There are two main problems to solve: notifying the browser when there’s an update, and replacing the cached bootstrap script with the new version.

update notification: Here we make the assumption that the snippet is making some subsequent request to the 3rd party server for dynamic data, to send a beacon, etc. We piggyback on this. In the case of the Tweet Button, there are four requests to the server: one for an iframe HTML document, one for JSON containing the tweet count, and two 1×1 image beacons (presumably for logging). Any one of these could be used to trigger an update. The key is that the bootstrap script must contain a version number. That version number is then passed back to the snippet server in order for it to detect if an update is warranted.

replacing the cached bootstrap script: This is the trickier part. If we give the bootstrap script a longer cache time (which is the whole point of this exercise), we have to somehow overwrite that cached resource even though it’s not yet expired. We could dynamically re-request the bootstrap script URL, but it’ll just be read from cache. We could rev the URL by adding a querystring, but that won’t overwrite the cached version with the hardwired URL referenced in the snippet. We could do an XHR and modify the caching headers using setRequestHeader, but that doesn’t work across all browsers.

Stoyan struck on the idea of dynamically creating an iframe that contains the bootstrap script, and then reloading that iframe. When the iframe is reloaded it’ll generate a Conditional GET request for the bootstrap script (even though the bootstrap script is cached and still fresh), and the server will respond with the updated bootstrap script which overwrites the old one in the browser’s cache. We’ve achieved both goals: a longer cache time while preserving the ability to receive updates when needed. And we’ve replaced numerous Conditional GET requests (every 30 minutes in the case of widgets.js) with only one Conditional GET request when the bootstrap script is actually modified.

An example

Take a look at this Self-updating Scripts example modeled after Google Analytics. This example contains four pages.

page 1: The first page loads the example snippet containing bootstrap.js:

(function() {
    var s1 = document.createElement('script');
    s1.async = true;
    s1.src = 'http://souders.org/tests/selfupdating/bootstrap.js';
    var s0 = document.getElementsByTagName('script')[0];
    s0.parentNode.insertBefore(s1, s0);
})();

Note that the example is hosted on stevesouders.com but the snippet is served from souders.org. This shows that the technique works for 3rd party snippets served from a different domain. At this point your browser cache contains a copy of bootstrap.js which is cacheable for 1 week and contains a “version number” (really just a timestamp). The version (timestamp) of bootstrap.js is shown in the page, for example, 16:23:53. A side effect of bootstrap.js is that it sends a beacon (beacon.js) back to the snippet server. For now the beacon returns an empty response (204 No Content).

page 2: The second page just loads the snippet again to confirm we’re using the cache. This time bootstrap.js is read from cache, so the timestamp should be the same, e.g., 16:23:53. A beacon is sent but again the response is empty.

page 3: Here’s where the magic happens. Once again bootstrap.js is read from cache (so you should see the same version timestamp again). But this time when it sends the beacon the server returns a notification that there’s an update. This is done by returning some JavaScript inside beacon.js:

(function() {
  var doUpdate = function() {
    if ( "undefined" === typeof(document.body) || !document.body ) {
      setTimeout(doUpdate, 500);
    }
    else {
      var iframe1 = document.createElement("iframe");
      iframe1.style.display = "none";
      iframe1.src = "http://souders.org/tests/selfupdating/update.php?v=[ver #]";
      document.body.appendChild(iframe1);
    }
  };
  doUpdate();
})();

The iframe src points to update.php:

<html>
<head>
<script src="http://souders.org/tests/selfupdating/bootstrap.js"></script>
</head>
<body>
<script>
if (location.hash === '') {
    location.hash = "check";
    location.reload(true);
}
</script>
</body>
</html>

The two key pieces of update.php are a reference to bootstrap.js and the code to reload the iframe. The location hash property is assigned a string to avoid reloading infinitely. The best way to understand the sequence is to look at the waterfall chart.

This page (newver.php) reads bootstrap.js from cache (1). The beacon.js response contains JavaScript that loads update.php in an iframe and reads bootstrap.js from cache (2). But when update.php is reloaded it issues a request for bootstrap.js (3) which returns the updated version and overwrites the old version in the browser’s cache. Voilà!

page 4: The last page loads the snippet once again, but this time it reads the updated version from the cache, as indicated by the newer version timestamp , e.g., 16:24:17.

Observations & adoption

One observation about this approach is the updated version is used the next time the user visits a page that needs the resource (similar to the way app cache works). We saw this in page 3 where the old version of bootstrap.js was used in the snippet and the new version was downloaded afterward. With the current typical behavior of short cache times and many Conditional GET requests the new version is used immediately. However, it’s also true with the old approach that if an update occurs while a user is in the middle of a workflow, the user won’t get the new version for 30 minutes or 2 hours (or whatever the short cache time is). Whereas with the new approach the user would get the update as soon as it’s available.

It would be useful to do a study about whether this approach increases or decreases the number of beacons with an outdated bootstrap script. Another option is to always check for an update. This would be done by having the bootstrap script append and reload the update.php iframe when it’s done. The downside is this would greatly increase the number of Conditional GET requests. The plus side is the 3rd party snippet owner doesn’t have to deal with the version number logic.

An exciting opportunity with this new approach is to treat update.php as a manifest list. It can reference bootstrap.js as well as any other resources that have long cache times but need to be overwritten in the browser’s cache. It should be noted that update.php doesn’t need to be a dynamic page – it can be a static page with a far future expiration date. Also, the list of resources can be altered to reflect only the resources that need to be updated (based on the version number received).

A nice aspect of this approach is that existing snippets don’t need to change. All of the changes necessary to adopt this self-updating behavior are on the snippet owner’s side:

  • Add a version number to the bootstrap script.
  • Pass back the version number to the snippet server via some other request that can return JavaScript. (Beacons work – they don’t have to be 1×1 transparent images.)
  • Modify that request handler to return JavaScript that creates a dynamic iframe when the version is out-of-date.
  • Add an update.php page that includes the bootstrap script (and other resources you want to bust out of cache).
  • Increase the cache time for the bootstrap script! 10 years would be great, but going from 30 minutes to 1 week is also a huge improvement.

If you own a snippet I encourage you to consider this self-updating approach for your bootstrap scripts. It’ll produce faster snippets, a better experience for your users, and fewer requests to your server.

 

33 Responses to Self-updating scripts

  1. You probably also want to make sure the resources are marked cache-control: private so that intermediate proxies don’t just serve the cached version to the refreshed IFrame. Might be a little tricky to pull off and still get your CDN to cache it but should be possible.

  2. One can always rely on Pat to spot the edge cases! :)

  3. Thank you for this article, Steve, and for your nice idea, Stoyan. It’s a little hard to understand everything at first read but the achievement opens the doors to a fastest Internet. I can’t wait to see Twitter, Facebook and other doing this.

    In some cases, the third-party service doesn’t use any beacon. Then, perhaps the bootstrap.js file should contain its own code generating the frame, right?

  4. Is this achieving the same behavior that could be obtained if browsers supported the `stale-while-revalidate` cache-control option? e.g.

    cache-control: max-age=0, stale-while-revalidate=604800

    Also, I was under the impression that caches are allowed to serve stale content while revalidating unless the `must-revalidate` option is set. Not sure that any of browser caches do this, but I believe it is quite common on proxies and CDNs.

  5. Hi Steve,

    I wonder why your example doesn’t work in my Firefox 13 (in Ubuntu), here’re my screenshots with HttpFox trace

    first page — http://www.freeimagehosting.net/yx9zh

    last page — http://www.freeimagehosting.net/l3ie3

    Thanks.

  6. What about storing the version number in a cookie to avoid having to create an iframe?

    *bootstrap*

    v = parseCookie(‘version’);
    s1.src = ‘//example.com/app.js?v=’ + v;

    *beacon*
    setCookie(‘version’, ’10.0.0′)
    doc.body.appendChild(doc.createElement(‘script’)).src = ‘//example.com/app.js?v=10.0.0′

    Performs better on mobile and slightly better on desktop

  7. Pat,

    The following header combo should allow browsers to cache for a long time, while instructing intermediate proxies to refresh more often:

    Cache-Control: max-age=31536000, s-maxage=1800

    Obviously, no one can guarantee that the behavior of all intermediate proxies follows the spec and therefore honors this value, so your “private” idea is clearly safer.

    All the major CDNs support Cache-Control and TTL overrides, and some even support header injection, so getting a CDN to ignore, override, or inject these headers should be easy, especially for major CDN customer like Twitter, Facebook, etc.

  8. @Pat: Good point. Jason had a reply.

    @Jason: Thanks for the pointer to s-maxage.

    @Sean: Yes, stale-while-revalidate would fix this. I should have guessed MNot had thought of this.

    @Jerry: It looks like the iframe reload didn’t work. You could look for JS errors.

    @kwame: I think you can’t set cookies inside a cross-domain script.

  9. Well, it just so happens that we’ve had a version number in our beacons from the get go, and we’ve also been tracking which versions of our JavaScript the beacon comes from. Once I turn this on, I can get back to you with how the distribution changed.

    Intermediate proxies might still be a problem, but the experiment should confirm or deny that.

  10. Looks like you don’t need to set the iframe’s id. Better to skip it to avoid a collision with an existing id.

  11. Ah great idea, i’ll put this into the mix while we sort out our front end (3rd party) SPOFs.

    Stuart

  12. Ok, this is live for our script :)

  13. Cool method! As I understand, the whole magic comes from forcing reload() which emulates app cache behavior on selected resources.

    I think there is a need for a widget boilerplate code with all the components coded up and simple instructions for triggering updates and testing if things work as expected.

    It would be great to have some field data regarding the stale-while-revalidate, this kind of tooling belongs there.

  14. Is this “iframe reload = revalidate” behavior consistent across all browsers? What about future browsers? If a third-party script becomes dependent on this method for getting out timely updates, and then a future browser decides that reloading an iframe should just pull from cache, the third-party scripts would be SOL. I’m trying to determine how hacky vs. specified it is to depend on this working everywhere.

    Very cool, though! This sounds super promising.

  15. Just one point about the example you have, it runs before onload fires, so on IE, the call to document.body.appendChild will fail with document.body is undefined.

  16. Thanks Steve! Going to work this into Video.js.

    In the beacon.js response that creates the iframe, is the query var on update.php (update.php?v=10:10:51) used for anything?

    For the IE body issue Philip pointed out in beacon.js, I might just use a quick and dirty dom ready check. AFAIK only firefox <3.6 doesn't support document.readyState.

    function makeIFrame() {
    var iframe1 = document.createElement("iframe");
    iframe1.style.display = "none";
    iframe1.src = "http://souders.org/tests/selfupdating/update.php?v=10:10:51&quot;;
    document.body.appendChild(iframe1);
    }

    if (document.readyState === "complete") {
    makeIFrame();
    } else if (window.addEventListener) {
    window.addEventListener("load", makeIFrame, false);
    } else if (window.attachEvent) {
    window.attachEvent("onload", makeIFrame);
    }

  17. This technique seems to rely on the behavior that if an iframe has been loaded, and is reloaded via JS, then the cache of the resources contained within the iframe is not respected and a force-download is issued, yet the freshly downloaded resources are now put into the cache for subsequent requests.

    I can’t imagine (a) why browsers would implement this behavior, (b) is this behavior consistent across browsers, and (c) are they likely to retain support for this behavior in the future. Seems a little too hacky for comfort without clarity about this.

    Interesting direction anyway. This could lead to interesting ideas.

  18. Just echoing what Rakesh says, it’s an amazing hack, but I’d feel nervous using it while reload(true) is missing from the spec http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#dom-location-reload

    But it’s such a great hack, I’d like to see pressure put on the whatwg to get this into the spec, or something in XHR that makes it ignore the current cache but saves the returned data into the cache (according to headers)

  19. Hi Steve, Stoyan,

    why not use srcless iframe instead of having to call another url (update.php)

    Did no cross browser testing so far, but it works so far… (see you at fluent ;)

    Benedikt

    var ifrm = document.createElement(“iframe”);
    ifrm.id = “ifrm”;
    ifrm.style.display = “none”;
    document.body.appendChild(ifrm);

    ifrm = (ifrm.contentWindow) ? ifrm.contentWindow : ((“undefined” !== typeof ifrm.contentDocument) && ifrm.contentDocument.document) ? ifrm.contentDocument.document : ifrm.contentDocument;
    ifrm.document.open();
    ifrm.document.write(‘Hello World!’);
    ifrm.document.write(”);
    ifrm.document.write(‘console.log(location.hash);if (location.hash === “”) {location.hash = “check”;location.reload(true);}’);
    ifrm.document.close();

  20. calling the script via script tag is missing in my post. (After Hello World, document.write(script with source)

  21. @Philip: Thanks for all the great feedback. I’m sorry for causing you problems. I removed the iframe1.id line. I fixed the bug with the document.body.appendChild race condition. I can’t reproduce the IE8 bug where update.php gets opened in a new tab. I’ll keep investigating that.

    @Sean, @Rakesh, @Jake: I think your concerns are unfounded, but it does require more explanation that I provided in the blog post. Your concerns hinge on the behavior of reload() wrt caching. Note that reload() is called wrt update.php. I mention that update.php doesn’t have to be dynamic – it could be a static HTML document. Regardless, the caching headers for update.php could be set to “must-revalidate”. With this change, even if reload(true) was ignored and the browser tried reading the document from cache, the “must-revalidate” caching header would cause it to check for a new version. If update.php returned a 200 response the browser would be in “reload mode” and would thus do at least a Conditional GET request for bootstrap.js. Worst case this is an extra step that would have to be added, but as of now every browser I’ve tested has done the reload with the behavior desired by calling reload(true) as described on MDN. Does that make sense?

    @SteveH: The version # is used so that “the list of resources can be altered to reflect only the resources that need to be updated” inside update.php. I like your fix for the document.body race condition, but I don’t have a strong preference for setTimeout vs adding a listener.

    @Benedikt: I tried using a srcless iframe but ran into cross-domain restrictions.

  22. @Jason, @Steve, one drawback to with s-maxage. According to http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

    it forces the intermediary cache servers to revalidate

    s-maxage
    If a response includes an s-maxage directive, then for a shared cache (but not for a private cache), the maximum age specified by this directive overrides the maximum age specified by either the max-age directive or the Expires header. The s-maxage directive also implies the semantics of the proxy-revalidate directive (see section 14.9.4), i.e., that the shared cache must not use the entry after it becomes stale to respond to a subsequent request without first revalidating it with the origin server. The s- maxage directive is always ignored by a private cache.

  23. @Nick: We *do* want the shared cache to revalidate the bootstrap.js. Otherwise when update.php was reloaded the shared cache would just return the old version. The trick is deciding what the s-maxage value should be – it should probably be pretty short.

  24. I read the story with interest, because this is what we did at http://muscula.com, a service for logging JavaScript errors in production, that installs like google analytics.
    However we did not request any beacons, so we can’t use this method.
    Our bootstrap script was asynchronously loaded.
    I used the 2 hour cache-expiration (inspired from google), and the bootstrap script was a very minimal script that loaded the actual logging script. The actual logging script was url-revisioned and had an expires of several years.

  25. Ah. Didn’t consider that. Local storage would work well here and saving that tiny amount of data shouldn’t be problematic. No way to know how much impact the read operation is going to have though since it varies from page load to page load.

  26. Steve, I’ve blogged about our findings if you’re interested: http://www.lognormal.com/blog/2012/06/05/updating-cached-boomerang/

  27. Hi,Steve.In China,we have so many 3rd script.(ad)
    This article I translated into Chinese, hope to help more people

  28. Does anybody have a link to an actual implementation using Google Analytics, Facebook, or Twitter and using a static html page (not update.php). GA does not have a version number so I have a hard time replicating your example.

  29. @Dave have a look at my post (Comment #26) where I describe what we do at LogNormal. We also use a static HTML page. If you don’t have a version number in the beacon, there may be other parts of the request that you can use to create a fingerprint of the JavaScript version running.

    However, for ga, you can get the version number from `_gat._getTrackerByName()._getVersion()`

  30. @Philip Thanks for your response, but your article is still talking about using your boomerang script. Let’s consider GA, you get a snippet of code that loads ga.js with a max-age=43200. You can’t modify that header and you can’t add extra code to it like the bootstrap.js example here. So, I assume you are supposed to copy the code from ga.js, load it via your own server to give it a longer cache time, and then place an additional javascript file that checks for changes at GA, and if there are changes, it loads it via iframe, but I have not seen an example like this…

  31. @Dave I see what you mean. It’s a little more involved, but essentially the same.

    You’d put the GA code on your server, and then on the server side you’d check periodically for newer versions of GA. You’d update the version number requirement in your update.php script.

    On the client side, right after you load GA load up your own JavaScript. Your JavaScript checks the _getVersion() method of GA (see the GA docs) and if it is below what it expects, it loads up this code. You wouldn’t intercept the beacon in this case.

    You’d need to adapt this algorithm for FB and twitter using their appropriate version methods.

  32. Is it really a good idea to coop code owned by another party onto your own server. And would it even work (there are cross-origin issues possibly)? Even if you don’t run into technical problems, copying such code and changing the effective cache settings is just as evil as an over aggressive cache proxy — except you’d be subjecting users of your site, rather than the employees in your building.

    (p.s. A pun-based captcha? Cute but not very good for non-native English speakers.)

  33. Just one point about the example you have, it runs before onload fires, so on IE, the call to document.body.appendChild will fail with document.body is undefined.