Zopflinator

March 8, 2013 3:03 pm | 10 Comments

Last week Google open sourced Zopfli – a new data compression library based on Deflate (and therefore a drop-in replacement for Deflate without requiring any changes to the browser). The post mentioned that the “output generated by Zopfli is typically 3–8% smaller compared to zlib” but Zopfli takes longer to compress files – “2 to 3 orders of magnitude more than zlib at maximum quality”. 2 to 3 orders of magnitude longer! When I read that I was thinking Zopfli might take seconds or perhaps minutes to compress a 500K script.

Also, even after reading the detailed PDF of results it still wasn’t clear to me how Zopfli would perform on typical web content. The PDF mentioned four corpora. One was called “Alexa-top-10k” which is cool – that’s real web stuff. But was it just the HTML for those sites? Or did it also include scripts and stylesheets? And did it go too far and include images and Flash? I clicked on the link in the footnotes to get more info and was amused when it opened a list of URLs from my HTTP Archive project! That was gratifying but didn’t clarify which responses were analyzed from that list of sites.

The Canterbury Corpus was also listed. Seriously?! Isn’t Chaucer too old to be relevant to new compression algorithms? Turns out it’s not based on Canterbury Tales. (Am I the only one who jumped to that conclusion?) Instead, it’s a standard set of various types of files but most of them aren’t representative of content on the Web.

Finally, it mentioned “zlib at maximum quality”. AFAIK most sites run the default compression built into the server, for example mod_deflate for Apache. While it’s possible to change the compression level, I assume most site owners go with the default. (That would be an interesting and easy question to answer.) I had no idea how “zlib at maximum quality” compared to default web server compression.

My thirst for Zopfli information went unquenched. I wanted a better idea of how Zopfli performed on typical web resources in comparison to default server compression settings, and how long it took.

Luckily, before I got off Twitter Ilya pointed to some Zopfli installation instructions from François Beaufort. That’s all I needed to spend the rest of the morning building a tool that I and anyone else can use to see how Zopfli performs:

Zopflinator

It’s easy to use. Just enter the URL to a web resource and submit. Here’s an example for jQuery 1.9.1 served from Google Hosted Libraries.

Notes:

  • The form doesn’t accept URLs for HTML, images, and Flash. I added HTML to that list because Zopfli makes the most sense for static resources due to the longer compression times.
  • By default the results are recorded in my DB and shown lower in the page. It’s anonymous – no IPs, User-Agent strings, etc. are saved – just the URL, content-type, and compression results. But sometimes people want to test URLs for stealth sites – so that’s a good reason to UNcheck the crowdsource checkbox.
  • Four file sizes are shown. (The percentages are based on comparison to the uncompressed file.)
    • Uncompressed file size is the size of the content without any compression.
    • mod_deflate size is based on serving the content through my Apache server using mod_deflate’s default settings.
    • The gzip -9 size is based on running that from the commandline on the uncompressed content.
    • Zopfli results are also based on running the Zopfli install from the commandline.
  • Folks who don’t want to install Zopfli can download the resultant file.

The results in this example are pretty typical – Zopfli is usually 1% lower than gzip -9, and gzip -9 is usually 1% lower than mod_deflate. 1% doesn’t seem like that much, but remember that 1% is relative to the original uncompressed response. The more relevant comparison is between Zopfli and mod_deflate. About 70% of compressible responses are actually compressed (across the world’s top 300K websites). The primary question is how much less data would be transferred if responses were instead compressed with Zopfli?

One day after being launched, 238 URLs have been run through Zopflinator. The average sizes are:

  • uncompressed: 106,272 bytes
  • mod_deflate: 28,732 bytes (27%)
  • gzip -9: 28,062 bytes (26%)
  • Zopfli: 27,033 bytes (25%)

The average Zopfli savings over mod_deflate is 1,699 bytes (5.9%). This is pretty significant savings. Just imagine if all script and stylesheet transfer sizes were reduced by 6% – that would reduce the number of roundtrips as well as reducing mobile data usage.

Previously I mentioned that Zopfli has compression times that are “2 to 3 orders of magnitude more than zlib at maximum quality”. When running Zopfli from the commandline it seemed instantaneous – so my fears of it taking seconds or perhaps minutes were unfounded. Of course, this is based on my perception – the actual CPU time consumed by the web server at scale could be significant. That’s why it’s suggested that Zopfli might be best suited for static content. A great example is Google Hosted Libraries.

Of course, anyone using Zopfli out-of-band has to configure their server to use these pre-generated files for compressed responses. So far I haven’t seen examples of how to do this. Ilya pointed out that AWS and Google Cloud Storage work this way already – where you have to upload uncompressed as well as compressed versions of web resources. Zopfli fits well if you use those services. For people who currently do compression on the fly (e.g., using mod_deflate) the change to use Zopfli is likely to outweigh the benefits as long as Zopfli can’t be run in realtime.

I hope to see adoption of Zopfli where it makes sense. If you use Zopfli to compress your resources, please add a comment with feedback on your experience.

10 Responses to Zopflinator

  1. Cloning and getting Zopfli to build on my Macbook Air only took a minute or two, which was nice. I was curious about potential savings as well, I ran the exact same test with the jQuery that you did – http://josephscott.org/archives/2013/03/zopfli-compression/

    You mentioned that running Zopfli seemed instantaneous, my tests on a Macbook Air ( i7, 2GHz ) took over 16 seconds. Were you running “zopfli –i1000”?

  2. I think there is a lot more scope for reducing bandwidth in smart site architecture than in throwing hardware at the problem.

    Using this site as a (petty) example:

    This page is 15211 bytes, and standard gzip makes it 5943 bytes.

    I edited the page to extract a HTML document that contains only the unique content, and ignores the content of the page that is common to every page in your blog.

    This page was 9254 bytes, which compresses to 3972 bytes.

    So every page in your blog has about 2KB gzipped of shared content. If I visit 5 pages on your blog then I download that shared content every time, so 8KB gzipped of unnecessary download. This duplicated content could be loaded with AJAX and cached on the first page visit, so it would actually improve load times slightly. (I understand this is petty for your site, but for some sites it is a lot more than this.)

    Soon you will have comments on this page which everyone will download whether they are going to read them or not.
    In the page markup you could provide a link to the comments, and then use AJAX to load in the comments when someone wants them. (Again, petty on this site, but on some sites it adds up).

  3. “Previously I mentioned that Zopfli has compression times that are “2 to 3 orders of magnitude more than zlib at maximum quality”.”

    Still feels like a relevant comparison would be against mod_deflate with

    DeflateCompressionLevel 9

    Turning this on consumes noticeably less CPU than zopfli (like you say, 2-3 orders of magnitudes). Not saying zopfli isn’t a good idea, but there’s really no reason not to set mod_deflate to 9 IMO.

    Cheers,

    — Leif

  4. Joseph: I was running on my web server with no options.

    Sean: More complex than fits in comments. Loading templates dynamically disables browser lookahead parser. UX features of which content users want, making them wait, etc etc. Good to think about, no single easy solution.

    Leif: I know some big sites that disable compression when load is too high, so I assume it does affect availability, and increasing compression level would do so even more. I agree it would be interesting to compare, but I still value current defaults more.

  5. It would make even more sense if you also compared it against the two other other Deflate engines that where mentioned in the PDF (from 7-Zip and KZIP) since these two already perform a bit better than Zlib.
    And here are claims that Zopfli output could still be further reduced by another tool:
    http://encode.ru/threads/1689-Google-Compress-Data-More-Densely-with-Zopfli?p=32545&viewfull=1#post32545

  6. You mention that it would be “interesting and easy” to see whether content was being compressed at with the default settings? Is this something that the HTTP archive tracks? How could this be implemented (short of recompressing the original at every level and comparing the output)?

  7. Aline: Thanks for the pointers.

    Eric: To determine if web servers are using the “default” settings I would just compare the origin server’s compressed size to the one from my server (presumably the “default”). It’s more complicated than that (Apache vs IIS vs X, is my server the default, etc). This isn’t a default metric in the HTTP Archive, but would be a good one-off analysis for a performance researcher to conduct.

  8. Hi!

    I really like your tool, but when i tried one of my js it says: Unable to determine Content-Type for “https://[sanitize].js”.
    There was an error.

  9. Gabor: Zopflinator looks for the Content-Type response header. If that’s not present then the URL isn’t tested.

  10. Of course, anyone using Zopfli out-of-band has to configure their server to use these pre-generated files for compressed responses. So far I haven’t seen examples of how to do this.

    Here is a guide on how to serve Zopfli-precompressed content with nginx: http://www.cambus.net/serving-precompressed-content-with-nginx-and-zopfli/