Reloading post-onload resources

February 26, 2013 5:35 pm | 16 Comments

Two performance best practices are to add a far future expiration date and to delay loading resources (esp. scripts) until after the onload event. But it turns out that the combination of these best practices leads to a situation where it’s hard for users to refresh resources. More specifically, hitting Reload (or even shift+Reload) doesn’t refresh these cacheable, lazy-loaded resources in Firefox, Chrome, Safari, Android, and iPhone.

What we expect from Reload

The browser has a cache (or 10) where it saves copies of responses. If the user feels those cached responses are stale, she can hit the Reload button to ignore the cache and refetch everything, thus ensuring she’s seeing the latest copy of the website’s content. I couldn’t find anything in the HTTP Spec dictating the behavior of the Reload button, but all browsers have this behavior AFAIK:

  • If you click Reload (or control+R or command+R) then all the resources are refetched using a Conditional GET request (with the If-Modified-Since and If-None-Match validators). If the server’s version of the response has not changed, it returns a short “304 Not Modified” status with no response body. If the response has changed then “200 OK” and the entire response body is returned.
  • If you click shift+Reload (or control+Reload or control+shift+R or command+shift+R) then all the resources are refetched withOUT the validation headers. This is less efficient since every response body is returned, but guarantees that any cached responses that are stale are overwritten.

Bottomline, regardless of expiration dates we expect that hitting Reload gets the latest version of the website’s resources, and shift+Reload will do so even more aggressively.

Welcome to Reload 2.0

In the days of Web 1.0, resources were requested using HTML markup – IMG, SCRIPT, LINK, etc. With Web 2.0 resources are often requested dynamically. Two common examples are loading scripts asynchronously (e.g., Google Analytics) and dynamically fetching images (e.g., for photo carousels or images below-the-fold). Sometimes these resources are requested after window onload so that the main page can render quickly for a better user experience, better metrics, etc. If these resources have a far future expiration date, the browser needs extra intelligence to do the right thing.

  • If the user navigates to the page normally (clicking on a link, typing a URL, using a bookmark, etc.) and the dynamic resource is in the cache, the browser should use the cached copy (assuming the expiration date is still in the future).
  • If the user reloads the page, the browser should refetch all the resources including resources loaded dynamically in the page.
  • If the user reloads the page, I would think resources loaded in the onload handler should also be refetched. These are likely part of the basic construction of the page and they should be refetched if the user wants to refresh the page’s contents.
  • But what should the browser do if the user reloads the page and there are resources loaded after the onload event? Some web apps are long lived with sessions that last hours or even days. If the user does a reload, should every dynamically-loaded resource for the life of the web app be refetched ignoring the cache?

An Example

Let’s look at an example: Postonload Reload.

This page loads an image and a script using five different techniques:

  1. markup – The basic HTML approach: <img src= and <script src=.
  2. dynamic in body – In the body of the page is a script block that creates an image and a script element dynamically and sets the SRC causing the resource to be fetched. This code executes before onload.
  3. onload – An image and a script are dynamically created in the onload handler.
  4. 1 ms post-onload – An image and a script are dynamically created via a 1 millisecond setTimeout callback in the onload handler.
  5. 5 second post-onload – An image and a script are dynamically created via a 5 second setTimeout callback in the onload handler.

All of the images and scripts have an expiration date one month in the future. If the user hits Reload, which of the techniques should result in a refetch? Certainly we’d expect techniques 1 & 2 to cause a refetch. I would hope 3 would be refetched. I think 4 should be refetched but doubt many browsers do that, and 5 probably shouldn’t be refetched. Settle on your expected results and then take a look at the table below.

The Results

Before jumping into the Reload results, let’s first look at what happens if the user just navigates to the page. This is achieved by clicking on the “try again” link in the example. In this case none of the resources are refetched. All of the resources have been saved to the cache with an expiration date one month in the future, so every browser I tested just reads them from cache. This is good and what we would expect.

But the behavior diverges when we look at the Reload results captured in the following table.

Table 1. Resources that are refetched on Reload
technique resource Chrome 25 Safari 6 Android Safari/534 iPhone Safari/7534 Firefox 19 IE 8,10 Opera 12
markup image 1 Y Y Y Y Y Y Y
script 1 Y Y Y Y Y Y Y
dynamic image 2 Y Y Y Y Y Y Y
script 2 Y Y Y Y Y Y Y
onload image 3 Y Y Y
script 3 Y Y
1ms postonload image 4 Y
script 4 Y
5sec postonload image 5
script 5

The results for Chrome, Safari, Android mobile Safari, and iPhone mobile Safari are the same. When you click Reload in these browsers the resources in the page get refetched (resources 1&2), but not so for the resources loaded in the onload handler and later (resources 3-5).

Firefox is interesting. It loads the four resources in the page plus the onload handler’s image (image 3), but not the onload handler’s script (script 3). Curious.

IE 8 and 10 are the same: they load the four resources in the page as well as the image & script from the onload handler (resources 1-3). I didn’t test IE 9 but I assume it’s the same.

Opera has the best results in my opinion. It refetches all of the resources in the main page, the onload handler, and 1 millisecond after onload (resources 1-4), but it does not refetch the resources 5 seconds after onload (image 5 & script 5). I poked at this a bit. If I raise the delay from 1 millisecond to 50 milliseconds, then image 4 & script 4 are not refetched. I think this is a race condition where if Opera is still downloading resources from the onload handler when these first delayed resources are created, then they are also refetched. To further verify this I raised the delay to 500 milliseconds and confirmed the resources were not refetched, but then increased the response time of all the resources to 1 second (instead of instantaneous) and this caused image 4 & script 4 to be refetched, even though the delay was 500 milliseconds after onload.

Note that pressing shift+Reload (and other combinations) didn’t alter the results.

Takeaways

A bit esoteric? Perhaps. This is a deep dive on a niche issue, I’ll grant you that. But I have a few buts:

If you’re a web developer using far future expiration dates and lazy loading, you might get unexpected results when you change a resource and hit Reload, and even shift+Reload. If you’re not getting the latest version of your dev resources you might have to clear your cache.

This isn’t just an issue for web devs. It affects users as well. Numerous sites lazy-load resources with far future expiration dates including 8 of the top 10 sites: Google, YouTube, Yahoo, Microsoft Live, Tencent QQ, Amazon, and Twitter. If you Reload any of these sites with a packet sniffer open in the first four browsers listed, you’ll see a curious pattern: cacheable resources loaded before onload have a 304 response status, while those after onload are read from cache and don’t get refetched. The only way to ensure you get a fresh version is to clear your cache, defeating the expected benefit of the Reload button.

Here’s a waterfall showing the requests when Amazon is reloaded in Chrome. The red vertical line marks the onload event. Notice how the resources before onload have 304 status codes. Right after the onload are some image beacons that aren’t cacheable, so they get refetched and return 200 status codes. The cacheable images loaded after onload are all read from cache, so any updates to those resources are missed.

Finally, whenever behavior varies across browsers it’s usually worthwhile to investigate why. Often one behavior is preferred over another, and we should get the specs and vendors aligned in that direction. In this case, we should make Reload more consistent and have it refetch resources, even those loaded dynamically in the onload handler.

16 Responses to Reloading post-onload resources

  1. I am curious as to why this is the case: “(Technique) 5 probably shouldn’t be refetched.”

  2. Man you followed up on this fast! Wish I still had the time to run experiments.

    Interesting results that have brought up a few more questions…

    1. On Windows most browsers have a setTimeout least count of 15ms. Don’t think that’s an issue here though, but could you try with setImmediate as well?

    2. When you delayed the resources for 1 second in the Opera test, shouldn’t that have also delayed onload?

  3. I should also add that you’ll see the same behaviour when calling location.reload(true) from JavaScript inside the page.

  4. Myron: Since web apps can be long lived, it would generally be bad to ALWAYS refetch every dynamic resource if the web app was reloaded. You can imagine hours or days after the app was reloaded that user visited their photo album with a dynamic slideshow. IMO those photos should be read from cache and not refetched. So the browser has to decide at some point to stop refetching everything. 5 seconds seems like a good stopping point, albeit subjective and untested.

  5. Philip: Here’s a version with setImmediate. What browser are you going to test that on? Chrome, Safari, & Firefox are the most interesting ones to test (to see if they get more aggressive about refetching) but I don’t think they support it. Anyway, I don’t think it’s the 15ms setTimeout issue because I raised it to 40ms and it still didn’t refetch in Chrome.

    Here’s a version with the sleep=1. It does push out the onload, but the resources in question are loaded during or after the onload, so delaying onload doesn’t affect them.

  6. Steve: you’re right about the browser support for setImmediate. IE10 is the only one that supports it.

  7. I’m not sure I agree that it’s important for the Refresh button to reload non-expired post-onload resources.

    Many users refresh sites just to try to get new data (e.g. refreshing CNN.com for headlines). Refreshing the post-onload JS on that page, even if it’s just a 304 request, would be a waste of the user’s time.

    Indeed, when I know that everything on a page is cached except for short-lived data on the page, I’ve wished the browser would let me override the refresh button, allowing me to just re-request short-lived AJAX data without going through a full-page reload at all. Why make the user wait for all of that stuff, flash the screen, etc.?

  8. Dan: I can see your motivation for a “lighter” version of Reload, but sometimes people reload in order to refresh more than text – they actually want to refresh the JS or CSS, etc. Regardless, the point of this article is that all those resources are refetched if they occur before onload. Since more and more resources are being loaded after onload it would be more consistent to also refetch those.

  9. I suspect Webapp frameworks with their own asset compilation pipeline are going to win out. This is a shining example of why.

    If my page loads general.js asynchronously then you’re right, and I’m hosed by the reload semantics of the user agent. If it’s loading general-345636.js and the new page is loading general-753632.js then I have no problem.

  10. Jason: Great great point. You really should never change the contents of a public resource without also changing the URL.

  11. I have a fairly long writeup of IE’s behavior on Refresh here: http://blogs.msdn.com/b/ieinternals/archive/2010/07/08/technical-information-about-conditional-http-requests-and-the-refresh-button.aspx

    As noted in that article, a change was made to IE10 vis-à-vis refresh: IE10 applies cache-busting flags when XHR is used after a page is refreshed with CTRL+F5.

  12. To Dan’s point above: various folks contacted IE to request control of the refresh button for the very reason he describes– the user is *frequently* using the Refresh button on an AJAX-y site whereby the site’s script itself could do a better job in giving the user what they want (e.g. latest status updates or whatnot) without a flood of 302s.

    Ultimately, we elected not to make that change in IE10 for the very reasons you alluded too– if the user’s version of the page was corrupt for some reason, then such a change could be very bad for the user-experience.

  13. Eric: Hi! Thanks for the link. Most of the postonload resources that are NOT refetched are regular images and scripts – they are not XHRs. Is the IE10 change just for XHRs, or does it also apply to “items pulled down by JavaScript”? In my tests it appears this change doesn’t happen for regular images and scripts in IE10.

  14. Why not just clear all current page assets from the browser cache on a Force-Reload?

  15. hi Steve

    thanks for the research.

    At the time of writing this comment, your example seems to have a problem : the cache headers on the resources you are using is set to + 1 Month, because you are using this kind of URL :

    https://stevesouders.com/bin/resource.cgi?type=js&sleep=0&expires=1&n=5

    I copied back your code and tested with an expires=-1 and expires=0 . The results confirm your theory with expires = 0 but there is new stuff to get for the expires = -1 :

    – Chrome with expires 0 and document.location.reload() :
    http://www.webpagetest.org/results.php?test=130418_GJ_a11707efcfa343cc5004b7723b413df9

    in this configuration, no expires header is set, resources onload and post-onload are taken from the cache, as in your results

    – Chrome with expires -1 and document.location.reload() :

    http://www.webpagetest.org/results.php?test=130418_S2_02fc87adbef561ce611bae44fee23504

    here, expires header are set to 0 and expiration date is in the past. Chrome seems to react in a sane way (for the developer at least) and DOES NOT USE CACHE

    For reference the cache + 1 as in your current example, and works as expected : http://www.webpagetest.org/result/130418_CT_e20e850d48299321efd65e869e76004b/1/details/

    I guess we should try to reproduce that with all the other browsers now and check if behaviour is same for all. Also I’m wondering how the lack of expire header is interpreted in this case for all browsers

  16. Ok, did more tests using WPT to have my own table
    https://docs.google.com/spreadsheet/ccc?key=0AirWN83fytRxdGpoS24tRXVKZzB0d01oWTcwWEMxT3c&usp=sharing

    My goal was to answer to the question : are post-onload resources loaded from cache or not ?
    the answers variy between browsers with no cache header or with cache headers set to + 1 month.
    However, that looks consistent with those headers :
    Expires: ” – 1 month ”
    Cache-Control: public, max-age=0
    no eTag or Last-Modified

    We should probably make more tests with eTag caching and so on.