Velocity: Forcing Gzip Compression

July 12, 2010 6:57 pm | 25 Comments

Tony Gentilcore was my officemate when I first started at Google. I was proud of my heritage as “the YSlow guy”. After all, YSlow was well over 1M downloads. After a few days I found out that Tony was the creator of Fasterfox – topping 11M downloads. Needless to say, we hit it off and had a great year brainstorming techniques for optimizing web performance.

During that time, Tony was working with the Google Search team and discovered something interesting: ~15% of users with gzip-capable browsers were not sending an appropriate Accept-Encoding request header. As a result, they were sent uncompressed responses that were 3x bigger resulting in slower page load times. After some investigation, Tony discovered that intermediaries (proxies and anti-virus software) were stripping or munging the Accept-Encoding header. My blog post Who’s not getting gzip? summarizes the work with links to more information. Read Tony’s chapter in Even Faster Web Sites for all the details.

Tony is now working on Chrome, but the discovery he made has fueled the work of Andy Martone and others on the Google Search team to see if they could improve page load times for users who weren’t getting compressed responses. They had an idea:

For requests with missing or mangled Accept-Encoding headers, inspect the User-Agent to identify browsers that should understand gzip.
Test their ability to decompress gzip.
If successful, send them gzipped content!

This is a valid strategy given that the HTTP spec says that, in the absence of an Accept-Encoding header, the server may send a different content encoding based on additional information (such as the encodings known to be supported by the particular client).

During his presentation at Velocity, Forcing Gzip Compression, Andy describes how Google Search implemented this technique:

  • At the bottom of a page, inject JavaScript to:
    • Check for a cookie.
    • If absent, set a session cookie saying “compression NOT ok”.
    • Write out an iframe element to the page.
  • The browser then makes a request for the iframe contents.
  • The server responds with an HTML document that is always compressed.
  • If the browser understands the compressed response, it executes the inlined JavaScript and sets the session cookie to “compression ok”.
  • On subsequent requests, if the server sees the “compression ok” cookie it can send compressed responses.

The savings are significant. An average Google Search results page is 34 kB, which compresses down to 10 kB. The ability to send a compressed response cuts page load times by ~15% for these affected users.

Andy’s slides contain more details about how to only run the test once, recommended cookie lifetimes, and details on serving the iframe. Since this discovery I’ve talked to folks at other web sites that confirm these mysterious requests that are missing an Accept-Encoding header. Check it out on your web site – 15% is a significant slice of users! If you’d like to improve their page load times, take Andy’s advice and send them a compressed response that is smaller and faster.

Belorussian translation

25 Responses to Velocity: Forcing Gzip Compression

  1. What client legitimately does not understand gzip encoding at this point in time anyway?

  2. @Steven – it’s not a matter of the client not accepting gzip. The problem mostly exists because certain desktop security software programs mangle the Accept-encoding headers.

  3. >The ability to send a compressed
    >response cuts page load times by ~15%
    >for these affected users.

    Is the 15 a typo? I mean… there is latency and slowly starting TCP/IP, but I expected that number to be a little bit higher than that.

  4. @Mike

    Steven probably meant that testing isn’t necessary and you could just throw gzipped stuff at every request you get.

  5. @Jos,

    in that case, the question is – Do the security programs mangle the Encoding headers of the response?

  6. Are we not penalizing browsers with no gzip support asking them to run the iframe test on every page load ?

  7. Many personal firewalls strip the Accept-Encoding gzip header from browsers on purpose so they can examine the stream that comes in for items that should be blocked.

    Why they do not attempt to decode the stream instead is unknown, maybe a lazy programmer’s solution.

  8. Can anybody name 1 browser that doesn’t support gzip? (I mean browsers of this century of course)

  9. It might cause a problem if you don’t set the proper “no-cache/no-store” headers together the gzipped content, since it might confuse the intermediate proxies?

  10. @Jos: I’ve never seen an intermediary that mangles the Content-Encoding response header.

    @Shashi: The test is only one once as described in the slides.

    _ck_ hit the nail on the head. I don’t think any of these intermediaries thought “cool! we can make stuff slow!” Instead, it was perhaps simpler to deal with clear text responses. We’re actively trying to educate and fix these intermediaries, in conjunction with this forced compression approach.

    @tszming: Typically on compressed responses you’ll end up setting Vary: Accept-Encoding and possibly even Cache-Control: private.

  11. @Jos Hirth: the 15% is a mean, and there are still several external resources that factor into page load time.

    @Shashi Kant: yes, this does penalize users behind affected proxies or security software that fail the test. However, the test is performed only on the first request of a session.

  12. We had a case where GData antivirus strippe off our Content-Encoding header but leaving the content compressed.

    Now granted, the compression used in this case was non-standard (Accept-Encoding: x-ps-bzip2), so GData could not decompress it. But IMHO, then stripping off the Content-Encoding header is a very bad thing and doesn’t bode well for your technique you are describing here.

    In our case, where I controlled both client and server and really wanted to use bzip2 due to huge amounts of XML data being transmitted and bzip working significantly better, we in the end dropped all the standard compliant abd ruined by AV Software headers and used the User-Agent and modified content-type headers to signal compression being used.

    *sigh*

  13. I guess the golden question (it wasn’t addressed in the slides either) is:

    What percentage of users still could NOT accept gzip encoding even after the cookie/iframe/javascript test?

    Also, in this article [1], it states that the only reasons users don’t get compressed content (excluding websites that just don’t serve it) are all due to intermediaries (anti-virus software, browser bugs, web proxies) mangling the Accepts-encoding header. It says nothing of users who actually cannot accept gzip-encoding.

    In other words, it seems that the users who actually browse with agents that, for whatever reason, cannot understand gzip-encoding are negligable/non-existent.

    Of course this could be verified or refuted if you know what percentage of users fail the forced-gzip test.

  14. Oops, I forgot to link the article I was referring to in [1] above:

    [1]: http://code.google.com/speed/articles/use-compression.html

  15. Yes, I meant, what kind of problem would there be if you sent gzip to everyone all the time. I imagine that the “security” software that is mangling Accept Headers will just assume that it is binary data. Anyhow, those people that have that installed likely have all sorts of sporadic problems.

  16. This is now a WordPress plugin (first comment):

    http://wpdevel.wordpress.com/2010/07/13/forcing-gzip-httpwww-stevesouders-co/

  17. I suspect that the natural response from intermediary proxy authors would be (at least in the short term) to send out “Accept-Encoding: identity” header in the requests, making it non-standard to send forced GZIP responses. It would be a lot more simple for them to implement it then decompressing the responses.

  18. Just a thought:

    Might it be possible to dynamically inject the script into the document by conditionally ‘chunking’ it to the end, and thus removing the need for an additional call and iFrame?

    Or am I missing the mark?

  19. Hi Steve,

    On a daily basis, we see about 10% of the traffic does not have accept-encoding set on our website. We’ve implemented the force gzip compression after listening to the velocity talk. It is such clever idea, our implementation is working in production, but for some reason we are only able to sent back compressed content to less than 3% of the requests, definitely not as much as what google is able to. In fact, we still have more than 3-4% that does not accept gzip compression but had the test. Any suggestions on why we are getting such low rate with this feature? Is it possible with the way our test servlet is done?

  20. @Shannon: Andy’s preso does not mention what percentage of users were able to receive forced compression. If the test says 3-4%, that might be it for your audience.

  21. What about using an image rather than an iframe ?

    I mean :
    a = new Image;
    a.onload = function(){…}
    a.src = ‘gzipped-blank-image’

    I tested it on my local host and it seems to works equally well. Have you some data about iframe vs Image ?

    Image seems much simpler to me, because iframe needs special code to prevent a loading bar, needs expensive DOM usage, etc.

    Any though about this?

  22. @Nicolas: Images aren’t compressed so it doesn’t test the browser’s ability to handle compressed content.

  23. Even if it’s useless, browser do support compressed images. I tested the PHP code below and it works perfectly. So we could replace your iframe with this – if it’s an enhancement of course (it works on my local setup, that’s all I know right now :) )

    a = new Image;
    a.onload = function(){alert(‘OK’);};
    a.onerror = function(){alert(‘KO’);};
    a.src = ‘?i=1′;

  24. Oups, try again:

    {{?php

    if (isset($_GET['i'])):

    header(‘Content-Encoding: deflate’);

    // A base64 encoded transparent 1×1 GIF
    $img = ‘R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==’;

    echo gzdeflate(base64_decode($img));

    else: ?}}

    {{script}}
    a = new Image;
    a.onload = function(){alert(‘OK’);};
    a.onerror = function(){alert(‘KO’);};
    a.src = ‘{{?php echo basename(__FILE__);?}}?i=1′;
    {{/script}}

    {{?php endif;?}}

  25. Along the same lines as Nicolas, is there a distinct need for the iframe? That is, instead of an iframe, could an always compressed javascript src be served?