(lack of) Caching for iPhone Home Screen Apps

June 28, 2011 10:14 pm | 9 Comments

Yesterday’s post, Unexpected Reloads in WebKit, revealed an interesting behavior that affects caching in Safari:

When you load the same URL back-to-back in Safari, the second load is treated the same as hitting Reload.

This is bad for performance because the browser issues a Conditional GET request for each resource instead of using the cached resource.

It’s important to be aware of this behavior when testing the primed cache experience in Safari, so web performance engineers should take note. However, in the real world it’s unlikely this behavior has much of an impact on desktop users. Here’s the table from yesterday’s post that shows how this Reload-like behavior is triggered when re-requesting a page:

way of loading URL again like Reload?
hit RETURN in location field yes
delete URL and type it again yes
launch same URL via bookmark yes
click link to same URL yes
go to another URL then type 1st URL again no
modify querystring no
enter URL in a new tab no
Table 1. Triggering reload behavior in Safari

It’s possible that real world users might type the same URL or open the same bookmark two times in a row in the same tab, but it probably doesn’t happen that often. So what’s the big deal?

So what’s the big deal?

Whenever I see strange performance behavior I think about where that behavior might have a significant impact. Is there any place where this back-to-back Safari Reload behavior could have a significant impact? A comment from yesterday’s post hints at the answer:

Why is this article named “Unexpected Reloads in WebKit”?

Chrome is based on Webkit and doesn’t has same issue. Perhaps it would be less confusing to name it “Unexpected Reloads in Safari”.

Other people gave me the same feedback on the backchannel – why did I say “WebKit” instead of “Safari”.

Here’s why: WebKit is used in a lot of browsers. Whenever I see a bug (or a feature) in one popular WebKit-based browser I wonder if it exists in others. The main WebKit-based browsers I focus on are Chrome, Safari, Android, and iPhone. As soon as I noticed this behavior in Safari my next step was to conduct the same tests in Chrome, Android, and iPhone. As the commenter noted, this unexpected Reload behavior does not happen in Chrome. And it does not happen on Android (tested on my Nexus S). But it does happen on iPhone.

Update June 29: In a comment on yesterday’s post, Libo Song correctly pointed out that this back-to-back Reload-like behavior does happen on Android. He tested on Nexus One and I confirmed on Nexus S. Although Android does exhibit the Reload-like behavior when the same URL is entered back-to-back in the same tab, this doesn’t happen very often. The more important issue that is the focus of this post is how this Reload-like behavior slows down the launching of home screen apps. In this regard Android does not exhibit the Reload-like behavior when home screen apps are launched. Here’s a HAR file showing Amazon being loaded twice from the home screen with fast.stevesouders.com in between. The second launch doesn’t generate any HTTP requests.

While it’s true that iPhone users are unlikely to manually launch the same URL twice-in-a-row in the same tab, there is a situation when this happens automatically: when launching home screen apps.

Home screen apps are a powerful feature on iPhone and Android that allow users to save URLs to the home screen and launch them similar to native apps. Unfortunately, launching home screen apps on the iPhone triggers something similar to the Reload behavior we see in Safari – where resources aren’t read from cache and instead generate extra HTTP requests. Let’s take a look at a few examples of home screen apps, starting with simple to more complex.

Amazon: simple URL

Typing http://www.amazon.com/ into the iPhone browser displays a version of Amazon’s front page that is customized for mobile – there’s less content, the images are smaller, etc. However, there is not a prompt to save the URL to the home screen. We can do that anyway using the arrow function key at the bottom of the screen and selecting “Add to Home Screen”.

If you’ve used home screen apps you might have noticed that they always open in the same browser tab. Let’s run a little test to confirm this:

  1. Click the Amazon home screen icon. This opens Amazon in mobile Safari.
  2. Open another tab by clicking the “pages” function key and opening a “New Page”. Enter some non-Amazon URL in this new tab, for example http://fast.stevesouders.com/ (a very lightweight page I use for testing). At this point we have at least two tabs, one with Amazon and one with fast.stevesouders.com, and we’re looking at the fast.stevesouders.com tab.
  3. Go back to the home screen and click the Amazon icon again.
  4. Note that you’re taken back into mobile Safari to the first tab that contains Amazon.

We just opened the exact same URL back-to-back in the same tab. We didn’t do it intentionally – that’s the default behavior for iPhone home screen apps. Here’s a waterfall chart for this test. (You can view an interactive waterfall by loading the HAR file in pcapperf.)

The home screen app URL is http://www.amazon.com/gp/aw/h.html/187-9233150-9797455. The first time the home screen app is launched starts at the top with 187-9233150-9797455. Since the cache was empty all the subsequent resources have 200 responses. There are some 404s for icons followed by the request for fast.stevesouders.com.

The second launch of the Amazon home screen app (187-9233150-9797455 below fast.stevesouders.com) is where it gets interesting. When the Amazon home screen app is launched the second time, a Conditional GET request is made for all of the resources even though these resources are in the cache with a future expiration date.

All of the resources that are re-requested have an expiration date more than 10 years in the future. For example, the response headers for title_gradient._V233984477_.png are:

content-length: 291
expires: Tue, 06 May 2031 21:44:21 GMT
last-modified: Mon, 10 Aug 2009 11:50:45 GMT
cache-control: max-age=626560472
date: Wed, 29 Jun 2011 01:09:49 GMT
content-type: image/png

We know it was cached because when the Amazon home screen app is launched the second time the Conditional GET request for title_gradient._V233984477_.png has an If-Modified-Since header that contains the last-modified date in the initial response:

if-modified-since: Mon, 10 Aug 2009 11:50:45 GMT

It appears that we’ve stumbled into the Reload-like behavior we saw in Safari on the desktop. Further evidence of this is if you launch the home screen app, then type a new URL over the Amazon URL, and launch the home screen app again the resources are read from cache instead of generating numerous Conditional GET requests. (Load this HAR file in pcapperf to see for yourself.)

Untappd: full screen app

Amazon was a simple home screen app – really just a bookmark on the home screen. Developers can do more with home screen apps to make them launch and look like native apps. As described in Apple’s How-To’s for Safari on iPhone, various parts of the home screen app user experience are customizable including the home screen icon, viewport, and zooming and scaling. Developers can also have their home screen app launch in “full screen mode” by hiding the Safari UI components, including the status bar and location bar. In this situation, every time the home screen app is launched it uses the same “tab” with the exact same URL – thus triggering the Reload behavior.

Let’s have a look at Untappd on the iPhone. The first time you navigate to http://untappd.com/ in iPhone’s browser you get a suggestion to add the web app to the home screen:

After which you’ll have a customized Untappd home screen icon:

Now let’s investigate how caching works for this home screen app. We start by clearing the cache then launching the home screen app. You’ll notice there is no location bar or other Safari controls. Then we go back to the home screen and launch the Untappd home screen app again. The waterfall chart is shown below. (Here’s the HAR file.)

The first time the Untappd home screen app is launched it loads seven HTTP requests. Three of these resources are cacheable: jquery.min.js (1 year), gears_init.js (1 hour), and ga.js (1 day). Loader.gif and ajax-loader.png don’t have a future expiration date, but they do have Last-Modified and ETag response headers that could be used in a Conditional GET request.

But we see that the second time Untappd is launched from the home screen, all of the resources are re-requested. To make matters worse, none of these are Conditional GET requests, so a 200 status code is returned with the full response body.

The punchline

It’s unfortunate that home screen apps suffer from this bad caching behavior on the iPhone. Thankfully, there is a workaround: application cache. I ran similar tests on other home screen apps that use application cache. The resources listed in the CACHE: section of  the manifest file were used on the iPhone without generating Conditional GET requests.

I feel bad about recommending the use of application cache. This is an issue with the browser cache on mobile Safari (and to a lesser degree on desktop Safari) that should be fixed. It’s a significant amount of work for developers to adopt application cache. The plus side is that doing so achieves the ability to work offline.

After this lengthy analysis and numerous waterfalls, here’s the punchline in a nutshell:

Home screen apps on iPhone are slower because resources are re-requested even though they should be read from cache. Use application cache to avoid this performance problem.

Update Oct 12: Home screen apps in iOS 5 do not exhibit this problem. Blaze.io reports that home screen apps use caching as expected. They also have faster JS likely do to the integration of the Nitro engine.

9 Comments

Unexpected Reloads in WebKit

June 27, 2011 4:47 pm | 9 Comments

People who work on web performance often need to load the same URL over and over again. Furthermore, they need to do this while simulating a real user’s empty cache experience and primed cache experience. When I want to analyze the empty cache experience the flow is simple: go to about:blank, clear the browser cache, enter the URL, and hit RETURN.

But what’s the right way to fetch a page repeatedly when analyzing the primed cache experience?

The main goal when testing the primed cache version of a page is to see which resources are read from cache. The goal for better performance is to cache as many responses as possible thus reducing the number of requests made when the cache is primed. If a resource has an expiration date in the future, the browser uses the cached version and doesn’t have to make an HTTP request resulting in a faster page. If a resource is expired (the expiration date is in the past) the browser issues a Conditional GET request using the If-Modified-Since and If-None-Match request headers. If the resource hasn’t changed then the server returns a simple 304 status code with no body. This is faster (because there’s no response body) but still takes time to do the HTTP request. (See my article on ETags for examples of IMS and INM.)

One way to re-request a page is to hit the Reload button, but this doesn’t give an accurate portrayal of the typical primed cache user experience. Hitting Reload causes the browser to always make an IMS/INM request for resources in the page, even for cached resources that have an expiration date in the future. Normally these resources would be used without generating an HTTP request. Although users do occasionally hit the Reload button it’s more likely that they’ll navigate to a page via a link or the location field, both of which avoid the time consuming Conditional GET requests generated when hitting Reload.

The technique I adopted years ago for re-requesting a page when testing the primed cache is to click in the location field and hit RETURN. That’s a fine approach in IE, Firefox, Chrome, and Opera, but not in Safari. Let’s investigate why.

hitting RETURN in the location field

I’m using Untappd as an example. Untappd has 68 requests when loaded on the desktop. Figure 1 shows the waterfall chart for the first 31 requests when loaded in Firefox 4 with an empty cache:

Figure 1. untappd.com – Firefox 4 – empty cache

Most of the resources shown in Figure 1 have an expiration date in the future and therefore won’t generate an HTTP request if the user has a primed cache. To test that I click in the location field and hit RETURN. The resulting waterfall chart is shown in Figure 2. Sure enough the number of HTTP requests drops from 68 to 4!

Figure 2. untappd.com – Firefox 4 – primed cache

If you repeat this experiment in Chrome, Firefox, Internet Explorer, and Opera you’ll get similar results – empty cache generates 68 requests, primed cache generates 4 requests. However, the result is very different in Safari 5. It’s important to understand why.

Safari is different

This test shows that Untappd has done a good job of optimizing the primed cache experience – the number of HTTP requests made by the browser drops from 68 to 4. Running the same test in Safari 5 produces different results. Clearing the cache and loading untappd.com in Safari 5 loads 68 HTTP requests – just as before. To test the primed cache experience we click in the location field and hit RETURN. Instead of only 4 requests there are 68 HTTP requests.

Why are there 64 more HTTP requests in Safari 5 for the primed cache test? Looking at the HTTP request headers we see that these are all Conditional GET requests. Let’s use http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js as the example (it’s the 8th request in Figure 1). In the empty cache scenario the HTTP request headers are:

Accept: */*
Cache-Control: max-age=0
Referer: http://untappd.com/
User-Agent: Mozilla/5.0 (Macintosh; [snip...] Safari/533.20.27

The HTTP status code returned for that empty cache request is 200 OK.

In the primed cache test when we hit RETURN in the location field we see that the request for jquery.min.js contains an extra header:

Accept: */*
Cache-Control: max-age=0
If-Modified-Since: Mon, 15 Feb 2010 23:30:12 GMT
Referer: http://untappd.com/
User-Agent: Mozilla/5.0 (Macintosh; [snip...] Safari/533.20.27

The header that’s added in the primed cache test is If-Modified-Since. This is a Conditional GET request. The HTTP status code that’s returned is 304 Not Modified. Even though all I did was hit RETURN in the location field, Safari treated that like hitting the Reload button.

unexpected “reload” in Webkit

Unlike other browsers, Safari 5 treats hitting RETURN in the location field the same as clicking the Reload button. When else does this happen? Assuming you’ve loaded a URL in Safari and are looking at that page, this table lists various ways to load that URL again. For each technique I show whether loading the URL this way generates extra Conditional GET requests similar to clicking Reload.

way of loading URL again like Reload?
hit RETURN in location field yes
delete URL and type it again yes
launch same URL via bookmark yes
click link to same URL yes
go to another URL then type 1st URL again no
modify querystring no
enter URL in a new tab no
Table 1. Triggering reload behavior in Safari

This black box testing indicates that whenever the same URL is loaded back-to-back in the same tab, Safari 5 treats it as a Reload. I was describing this behavior to Jay Freeman (saurik) at Foo Camp. He pointed me to this code from WebCore:

else if (sameURL)
   // Example of this case are sites that reload the same URL with a different cookie
   // driving the generated content, or a master frame with links that drive a target
   // frame, where the user has clicked on the same link repeatedly.
   m_loadType = FrameLoadTypeSame;

Searching in that same file for FrameLoadTypeSame we find this code:

case FrameLoadTypeReload:
case FrameLoadTypeReloadFromOrigin:
case FrameLoadTypeSame:
case FrameLoadTypeReplace:
   history()->updateForReload();
   m_client->transitionToCommittedForNewPage();
   break;

This code doesn’t account for the behavior, but it does show that FrameLoadTypeSame and FrameLoadTypeReload are treated as similar cases in this context, and perhaps that’s why IMS/INM requests are generated.

One important takeaway from this is: don’t hit RETURN in the location field to test primed cache experience in Safari. Instead, go to a different URL and then type the test URL in the location field, or open a new tab and type the URL.

There’s a second more important takeaway from this. I’ll cover that in tomorrow’s post. If you know the answer, please don’t spoil it. Oh what the heck – if you think you know the answer go ahead and add a comment.

9 Comments

HTTP Archive: 1M URLs, Internet Archive, Sponsors

June 15, 2011 10:28 am | 4 Comments

The HTTP Archive provides a permanent record of web performance information. It started in October 2010 crawling 1K URLs. This was possible thanks to Pat Meenan’s help providing access to WebPagetest. A month later we increased coverage to the world’s top ~18K URLs. That was good, but the next step is 1M URLs. Today at Velocity I made two announcements that pave the way for achieving this goal.

Starting today the HTTP Archive is part of the Internet Archive. I met Brewster Kahle several years ago and have always admired the work the Internet Archive has done building a “digital library of Internet sites.” When I approached him about this merger we both saw it as an obvious fit. In addition to preserving a record of the content of these sites (via the Wayback Machine) we agreed it’s important to record how that content is built and served. It makes sense that researchers, historians, and scholars be able to find both sets of information under one roof. I’ll continue to run the HTTP Archive project.

The following companies have agreed to sponsor the work of the HTTP Archive: Google, Mozilla, New Relic, O’Reilly Media, Etsy, Strangeloop, and dynaTrace Software. In order to grow to 1M URLs we need data center space, servers, licenses, etc. Thanks to these sponsors we’ve started to build out this infrastructure and will be increasing our coverage soon.

I look forward to working with the Internet Archive on our mission of preserving a record of the Web for generations to come. If you would like to join the effort, I invite you to make a donation to the Internet Archive and contribute your coding skills to the open source project.

 

4 Comments