Improving app cache

October 3, 2011 12:42 pm | 17 Comments

I recently found out about the W3C Workshop on The Future of Off-line Web Applications on November 5 in Redwood City. I won’t be able to attend (I’ll be heading to Velocity Europe), but I feel like app cache needs improving so I summarized my thoughts and sent it to the workshop organizers. I also pinged some mobile gurus and got their thoughts on app cache.

My Thoughts

SUMMARY: App cache is complicated and frequently produces an unexpected user experience. It’s also being (ab)used as a workaround for the fact that the browser’s cache does not cache in an effective way – this is just an arms race for finite resources.

DETAILS: I’ve spoken at many mobile-specific conferences and meetups in the last few months. When I explain the way app cache actually works, developers come up afterward and say “now I finally understand what was happening with my offline app.” These are the leading mobile developers in the world.

John Allsopp does a great job of outlining the gotchas, and I’ve added some (slides 50&51):

HTML responses with the MANIFEST attribute are stored in app cache by default, even if they’re not in the CACHE: section of the manifest file.
If a CACHE: resource 404s then none of the resources are cached.
The manifest file must be changed in order for changed CACHE: resources to be updated.
Modified CACHE: resources aren’t seen by the user until the second time they load the app – even if they’re online.

It’s easy to point out problems – you folks have the more difficult job of finding solutions. But I’ll make a few suggestions:

Use updated resources on first load – The developer needs a way to say “if the user is online, then fetch (some/all) of the CACHE: resources that have changed before rendering the app”. I would vote to make this the default behavior, and provide a way to toggle it (in the manifest file or HTML attribute). Perhaps this should also be done at the individual resource level – “I want updated scripts to block the initial rendering, but nothing else”. The manifest file could have an indicator of which resources to check & download before doing the initial rendering.
404s – I haven’t tested this myself, but it seems like overkill. Every response in the CACHE: section should be cached, independent of the other responses. Perhaps this is browser-specific?
updateReady flag – It’s great that developers can use the updateReady event to prompt the user to reload the app if any CACHE: resources have changed underneath them, but the bar is too high. In addition, have a flag that indicates that the browser should prompt the user automatically if any CACHE: resources were updated.

Finally, on the topic of arms race, I know many websites that are using app cache as a way to store images, scripts, and stylesheets. Why? It’s because the browser’s disk cache is poorly implemented. App cache provides a dedicated amount of space for a specific website (as opposed to a common shared space). App cache allows for prioritization – if I have 10M of resources I can put the scripts in the CACHE: section so they don’t get purged at the expense of less painful images.

Certainly a better solution would be for the browsers to have improved the behavior of disk cache 5 years ago. But given where we are, an increasing number of websites are consuming the user’s disk space. In most cases the user doesn’t have a way or doesn’t know how to clear app cache. Better user control over app cache is needed. I suggest that clearing “data” clears both the disk cache as well as app cache. Alternatively, we extend the browser UI to have an obvious “clear app cache” entry. Currently in Firefox and Chrome you can only clear app cache on a site-by-site basis, and the UI isn’t obvious. In Firefox it’s under Tools | Options | Advanced | Network | Remove. In Chrome it’s under chrome://appcache-internals/.

The most important near term fix is better patterns and examples.

My first offline app had a login form on the index.html – how should I handle that?
What if the JSON data in app cache requires authentication and the user is offline – use it or not?
I’ve never seen an example that uses the FALLBACK: section.

Adoption of current app cache would go much more smoothly with patterns and examples that address these gaps, and perhaps a JS helper lib to wrap updateReady and other standard dev tasks.

Mobile Gurus

A great email thread resulted when I asked a bunch of mobile gurus for their thoughts about app cache. Here’s a summary of the comments that resulted:

Scott Jehl	Agreed on app cache’s clumsiness. It’s so close though! The cache clearing is terrible for both users and developers.
Nicholas Zakas	+1 for AppCache clumsiness. My big complaint is requiring a special MIME type for the manifest file. This effectively limits its use to people who have access to their server configuration.
Yehuda Katz	My biggest concern is the lack of a feature that would make it possible to load the main index.html from cache, but only if the user agent is offline.Currently, if the user agent is online, the entire cache manifest, including the main index.html, is used. As a result, developers are required to come up with some non-standard UI to let the application user know that they should refresh the page in order to get more updated information.This is definitely the way to get the most performance, even when the user agent is online, but it creates an extremely clumsy workflow which significantly impedes adoption. I have given a number of talks on the cache manifest, and this caveat is the one that change the audience reaction from nodding heads to “oh no, another thing I have to spend time working out how to rebuild my application in order to use”. Again, I understand the rationale for the design, but I think a way to say “if the user agent is online, block until the cache manifest is downloaded” would significantly improve adoption and widen the appropriate use-cases for the technology.
Scott Jehl	I agree – the necessary refresh is the biggest downfall for me, too. It’s really prohibitive for using appcache in progressive enhancement approaches (where there’s actually HTML content in the page that may update regularly).It’d be great if you could set up appcache to kick-in when the user is actually offline, but otherwise stay out of the way and let the browser defer to normal requests and caching.
Yehuda Katz	I actually think we can get away with a more aggressive approach. When the device is online, first request the application manifest. If the manifest is identical, continue using the app cache. This means a short blocking request for the app manifest, but the (good) atomic cache behavior. If the manifest is not identical, fall back to normal HTTP caching semantics. It needs to be a single flag in the manifest I think.
Dion Almaer	Totally agree. In a recent mobile project we ended up writing our own caching system that had us use HTTP caching… It was very much a pain to have to do this work.

I like Yehuda’s suggestion about a blocking manifest check when the user is online controlled by a flag in the manifest file. We need more thinking around how to improve app cache. Please checkout the W3C Workshop on The Future of Off-line Web Applications website and send them your thoughts.

17 Responses to Improving app cache

Randy Peterman | 03-Oct-11 at 3:03 pm | Permalink |

I would like to see an authenticated caching mechanism. Example, if a user could authenticate some way /then/ they get access to the (encrypted) cached data. So that if I have a computer with several users trying to log into my web app I can prevent them from just doing open ended queries on cached data, but instead they can only access their data. I know this is a pretty massive leap, but it would be huge for apps with lots of personal data.
BS-Harou | 03-Oct-11 at 3:30 pm | Permalink |

Also, the App Cache should have imo better error handling. DOMExceptions aren’t very helpful.
Christopher Blizzard | 03-Oct-11 at 4:22 pm | Permalink |

Hi. So I’d like to make sure we’re separating the two issues that are kind of being overloaded here:

1. Better performance through aggressive caching.

2. An application that’s whole when you’re offline.

If you think about these two different use cases you can see why the 404 issue is an example. The app cache is really designed to handle the second use case, not the first.

I’m fine with overloading the app cache for general performance, but I also want to make sure that we’re knowingly do so. I haven’t seen other people make this observation, and just calling app cache broken as a result.

I also think that we need to fix caching in general, and we can probably overload the app cache to do so. It’s just a different use case, that’s all.

Related: there’s also a lot of issues around how we prompt users, what the expectations are and how it fits into some of the new browser designs around apps. So there’s a huge amount of work that needs to be done.
Guypo | 03-Oct-11 at 5:21 pm | Permalink |

I agree with most of what’s said above, but am uncomfortable with making Yehuda’s “block if we’re online” the default.

Being online can still mean you’re on a 3G network in a bad reception spot, and it’ll take 5 seconds before you manage to pull of a single request and response. Combine that with the fact many websites resources don’t change that frequently (especially apps that load resources with AJAX calls anyway), and you may be harming the user experience.

So my vote would be to add such a flag, but not make it the default, but rather an opt in.
Stan Wiechers | 03-Oct-11 at 5:41 pm | Permalink |

I used app cache for hyper-books extensively and am not very happy with the way it works as well. On top of your suggestions I would like

– wildcards for resources, cache all png’s *.png
– a base url for the manifest. i want to be able to say that example.com/ and example.com?id=5 and example.com/next/id with mod_rewrite use one cache. all those bust the app cache. you need to use hash based urls to avoid that.
– an easy way to remove the app cache from a browser. in the current version of chrome I still don’t see how I can do that, i can look at the cache but are not able to delete it. this would be very helpful for development.

if i would do a cached app again i prob would store many resources (flagged as network resource) in a client side database rather then the cache and reload them if there is a network connection and a flag on the server indicates a new version. this makes things more complicated but at least you save the reload.
Larry Garfield | 03-Oct-11 at 6:14 pm | Permalink |

Wow, how timely! I was just working on Manifest file support for the Drupal CMS last week for “aggressive caching” purposes, and decided today that the “cache the html page that sent it” logic makes that approach impossible. The HTML page is going to be very dynamic, but images and such won’t be.

Whether (ab)using the appcache as a developer-defined aggressive caching mechanism is the right approach or not, I don’t know. However, I do agree that we need some more robust caching mechanism other than “trust the browser to figure it out”.

Especially as more and more web sites these days are powered by a CMS, having some central “resources have been updated” or “these specific resources have been updated” message that the server can send is increasingly viable. When the page is dynamic, it can carry with it time-sensitive information about the resources it links to beyond just the URI and browsers should be able to leverage that.

In a sense, HTML5 moves a lot of functionality from the server back to the client… but cache control should at the same time be moved from the client back to the server, in a smarter way than simply expires headers.

As I said, right now the Manifest file is the only viable mechanism, and it kinda sucks for all of the reasons listed above. Whether beefing it up to become that “smart caching” channel is a good idea or if we should be introducing some other mechanism, I don’t know, but smart caching belongs on the server.

I look forward to what comes out of this conversation.
Julien Van Den Bossche | 03-Oct-11 at 11:01 pm | Permalink |

After studying cache limitations on mobile, I ran some tests on the application cache on mobile (Android, iphone, BlackBerry, Bada). Some limitations appeared. Hopes it help. http://www.winktoolkit.org/blog/235/
Jake Archibald | 03-Oct-11 at 11:46 pm | Permalink |

The refresh-to-see-new-stuff feature is confusing, but the most confusing thing I found was the need to put

NETWORK:
*

to prevent non-cached resources failing to download, even when online.

Also, wildcards cannot be used in FALLBACK URLs, meaning if you want to provide a fallback png, and a different fallback jpeg, they need to be in separate directories, or the full URLs must be provided for each image. IMO fallback should use regex, like mod rewrite
Jake Archibald | 04-Oct-11 at 12:03 am | Permalink |

Also, assuming each page pointing to the manifest isn’t ideal. The user may be given a deep link to a page that is covered by a fallback url. I’d like it to download the fallback pages and other explicitly cached resources, but not the page currently viewed.
Shwetank | 04-Oct-11 at 4:32 am | Permalink |

>Iâ€™ve never seen an example that uses the FALLBACK: section

I’m assuming you mean an example of someone using FALLBACK: in a production site right?
Brian | 04-Oct-11 at 12:56 pm | Permalink |

There is one scenario I never see mentioned that is possibly the gotcha of the application cache.

Suppose you have a webpage that requires authentication and you normally have index.php do a 302 redirect to login.php if the session has expired. You decide to use the application cache so you rework some code and add the redirect in the JavaScript but you forget to or intentionally don’t remove the 302. It seems relatively harmless since A) first time users should go to login.php anyway and B) even if the cache load fails on the 302 the first time, the user will login again triggering another successful cache event. (Nevermind the fact that a login redirect is wasteful and you shouldn’t be serving a dynamic page in a cached application.)

The problem exists if you don’t include index.php explicitly in the manifest. According to the spec, if you get a redirect AND aren’t in an explicit section, the previously cached version of the file is used in the new cache. This means that you will have version 2 of your JavaScript and CSS and version 1 of your HTML!

The solutions are to remove the redirect and to include index.php in the manifest but this is something you may not realize until after you release version 2 into production.
Gerben | 06-Oct-11 at 4:42 am | Permalink |

I don’t see the MIME problem that Nicholas describes. Users can just create a php for the manifest and use header(‘Content-Type: application/manifest’).

I already use PHP to generate my manifest file dynamically. It gets all the resources from a specified directory; lists them in the CACHE: section; and also calculates the latest last-modified from all the files and puts that in a comment. So whenever a file is added, removed, or modified, the manifest changes, triggering a re-download.
louisremi | 06-Oct-11 at 9:34 am | Permalink |

“I want to be able to say that example.com/ and example.com?id=5 and example.com/next/id with mod_rewrite use one cache. all those bust the app cache.”
@Stan Wiechers: I thought that was already the case. All pages referring to the same appcache belong to the same cache group according to the spec. This is how it works in Firefox according to my tests.
Larry Garfield | 07-Oct-11 at 8:51 am | Permalink |

I’ve posted some follow-up thoughts on the aggressive smart caching question here: http://www.garfieldtech.com/blog/caching-tng inspired by this thread.

Feedback welcome.
Mark Nottingham | 11-Oct-11 at 10:14 pm | Permalink |

+1 to Chris’ comment that we need to separate the problems out; app packaging / user experience is a big part of the problem.

In case you didn’t see it, I did an exploration of where HTTP caching might go to take some of the pressure off:
http://www.mnot.net/blog/2011/08/28/better_browser_caching
Bhashit | 06-Apr-12 at 6:14 pm | Permalink |

Perhaps we are not looking at the AppCache in the right way. AppCache, in my opinion, is basically for Web-Applications and not normal websites that update on a daily, or otherwise regular, basis. And by applications I mean actual replacements for desktop applications. Web applications that behave like desktop apps. They may exchange a lot of data over the net when they are online, but when they are offline, they lets the user do the work and store the state for later update to the server. We don’t see the desktop applications prompt regularly that a new update is available. They are updated periodically and it is okay to prompt the user to reload (or restart in case of desktop apps) once in a while after an update.
Sarath | 05-Jun-12 at 7:55 pm | Permalink |

My first two days, I found a lot of issues/caveats. thanks to lot of blogs, I was able to gather some information, but I struggled quite a bit. I blogged my findings here, please correct me if anything is wrong.
http://blog.sarathonline.com/2012/06/appcache-manifest-file-issuescaveats.html

I just wonder – why, if so many donot like appcache the way it is, it has been approved?

-Sarath.

SteveSouders.com

Improving app cache

My Thoughts

Mobile Gurus

17 Responses to Improving app cache