Revving Filenames: don’t use querystring
It’s important to make resources (images, scripts, stylesheets, etc.) cacheable so your pages load faster when users come back. A recent study from Website Optimization shows that the top 1000 home pages average over 50 resources per page! Being able to skip 50 HTTP requests and just read those resources from the browser’s cache dramatically speeds up pages.
This is covered in my book (High Performance Web Sites) and YSlow by Rule 3: Add an Expires Header. It’s easy to make your resources cacheable – just add an Expires HTTP response header with a date in the future. You can do this in your Apache configuration like this:
<FilesMatch "\.(gif|jpg|js|css)$"> ExpiresActive On ExpiresDefault "access plus 10 years" </FilesMatch>
That part is easy. The hard part is revving your resource filenames when you make a change. If you make
mylogo.gif cacheable for 10 years and then publish a modified version of this file to your servers, users with the old version in their cache won’t get the update. The solution is to rev the name, perhaps by including the file’s timestamp or version number in the URL. But which is better:
mylogo.gif?v=1.2? To gain the benefit of caching by popular proxies, avoid revving with a querystring and instead rev the filename itself.
There’s a section in my book called Revving Filenames. It contains an example of adding a version number to the filename. That’s prompted several emails where people have asked me about tradeoffs around using a querystring versus embedding something in the filename. I wasn’t aware of any performance difference, but in a meeting this week a co-worker, Jacob Hoffman-Andrews, mentioned that Squid, a popular proxy, doesn’t cache resources with a querystring. This hurts performance when multiple users behind a proxy cache request the same file – rather than using the cached version everybody would have to send a request to the origin server.
I tested this by creating two resources,
?v=1.2. Both have a far future Expires date. I configured my browser to go through a Squid proxy. I made one request to
mylogo.1.2.gif, cleared my cache (to simulate another user making the request), and fetched
mylogo.1.2.gif again. This produces the following HTTP headers:
>> GET http://stevesouders.com/mylogo.1.2.gif HTTP/1.1 << HTTP/1.0 200 OK << Date: Sat, 23 Aug 2008 00:17:22 GMT << Expires: Tue, 21 Aug 2018 00:17:22 GMT << X-Cache: MISS from someserver.com << X-Cache-Lookup: MISS from someserver.com >> GET http://stevesouders.com/mylogo.1.2.gif HTTP/1.1 << HTTP/1.0 200 OK << Date: Sat, 23 Aug 2008 00:17:22 GMT << Expires: Tue, 21 Aug 2018 00:17:22 GMT << X-Cache: HIT from someserver.com << X-Cache-Lookup: HIT from someserver.com
Notice that the second response shows a HIT in the X-Cache and X-Cache-Lookup headers. This shows it was served by the Squid proxy. More evidence of this is the fact that the Date and Expires response headers have the same values, even though I made these requests 10 seconds apart. For conclusive evidence, only one hit shows up in the stevesouders.com access log.
mylogo.gif?v=1.2 twice (clearing the cache in between) results in these headers:
>> GET http://stevesouders.com/mylogo.gif?v=1.2 HTTP/1.1 << HTTP/1.0 200 OK << Date: Sat, 23 Aug 2008 00:19:34 GMT << Expires: Tue, 21 Aug 2018 00:19:34 GMT << X-Cache: MISS from someserver.com << X-Cache-Lookup: MISS from someserver.com >> GET http://stevesouders.com/mylogo.gif?v=1.2 HTTP/1.1 << HTTP/1.0 200 OK << Date: Sat, 23 Aug 2008 00:19:47 GMT << Expires: Tue, 21 Aug 2018 00:19:47 GMT << X-Cache: MISS from someserver.com << X-Cache-Lookup: MISS from someserver.com
Here it’s clear the second response was not served by the proxy: the caching response headers say MISS, the Date and Expires values change, and tailing the stevesouders.com access log shows two hits.
Proxy administrators can change the configuration to support caching resources with a querystring, when the caching headers indicate that is appropriate. But the default configuration is what web developers should expect to encounter most frequently. Another interesting note about these tests: notice how the proxy downgrades the responses to HTTP/1.0. This is going to alter browser behavior in terms of the number of connections that are opened. When I’m doing performance analysis I make sure to avoid being connected through a proxy.
It’s important to leverage the caching provided by Squid and other proxies. Users get faster load times if resources are served from their proxy, avoiding the trip all the way back to the origin server. How many users does this affect? You can estimate this by tracking how many requests to your servers contain the Via header. The percentage is 5-25% – it varies depending on your audience and geographic region. For those users who are behind proxies, help foster a faster experience by avoiding a querystring for cacheable resources.