Last week Google open sourced Zopfli – a new data compression library based on Deflate (and therefore a drop-in replacement for Deflate without requiring any changes to the browser). The post mentioned that the “output generated by Zopfli is typically 3–8% smaller compared to zlib” but Zopfli takes longer to compress files – “2 to 3 orders of magnitude more than zlib at maximum quality”. 2 to 3 orders of magnitude longer! When I read that I was thinking Zopfli might take seconds or perhaps minutes to compress a 500K script.
Also, even after reading the detailed PDF of results it still wasn’t clear to me how Zopfli would perform on typical web content. The PDF mentioned four corpora. One was called “Alexa-top-10k” which is cool – that’s real web stuff. But was it just the HTML for those sites? Or did it also include scripts and stylesheets? And did it go too far and include images and Flash? I clicked on the link in the footnotes to get more info and was amused when it opened a list of URLs from my HTTP Archive project! That was gratifying but didn’t clarify which responses were analyzed from that list of sites.
The Canterbury Corpus was also listed. Seriously?! Isn’t Chaucer too old to be relevant to new compression algorithms? Turns out it’s not based on Canterbury Tales. (Am I the only one who jumped to that conclusion?) Instead, it’s a standard set of various types of files but most of them aren’t representative of content on the Web.
Finally, it mentioned “zlib at maximum quality”. AFAIK most sites run the default compression built into the server, for example mod_deflate for Apache. While it’s possible to change the compression level, I assume most site owners go with the default. (That would be an interesting and easy question to answer.) I had no idea how “zlib at maximum quality” compared to default web server compression.
My thirst for Zopfli information went unquenched. I wanted a better idea of how Zopfli performed on typical web resources in comparison to default server compression settings, and how long it took.
Luckily, before I got off Twitter Ilya pointed to some Zopfli installation instructions from François Beaufort. That’s all I needed to spend the rest of the morning building a tool that I and anyone else can use to see how Zopfli performs:
It’s easy to use. Just enter the URL to a web resource and submit. Here’s an example for jQuery 1.9.1 served from Google Hosted Libraries.
- The form doesn’t accept URLs for HTML, images, and Flash. I added HTML to that list because Zopfli makes the most sense for static resources due to the longer compression times.
- By default the results are recorded in my DB and shown lower in the page. It’s anonymous – no IPs, User-Agent strings, etc. are saved – just the URL, content-type, and compression results. But sometimes people want to test URLs for stealth sites – so that’s a good reason to UNcheck the crowdsource checkbox.
- Four file sizes are shown. (The percentages are based on comparison to the uncompressed file.)
- Uncompressed file size is the size of the content without any compression.
- mod_deflate size is based on serving the content through my Apache server using mod_deflate’s default settings.
- The gzip -9 size is based on running that from the commandline on the uncompressed content.
- Zopfli results are also based on running the Zopfli install from the commandline.
- Folks who don’t want to install Zopfli can download the resultant file.
The results in this example are pretty typical – Zopfli is usually 1% lower than gzip -9, and gzip -9 is usually 1% lower than mod_deflate. 1% doesn’t seem like that much, but remember that 1% is relative to the original uncompressed response. The more relevant comparison is between Zopfli and mod_deflate. About 70% of compressible responses are actually compressed (across the world’s top 300K websites). The primary question is how much less data would be transferred if responses were instead compressed with Zopfli?
One day after being launched, 238 URLs have been run through Zopflinator. The average sizes are:
- uncompressed: 106,272 bytes
- mod_deflate: 28,732 bytes (27%)
- gzip -9: 28,062 bytes (26%)
- Zopfli: 27,033 bytes (25%)
The average Zopfli savings over mod_deflate is 1,699 bytes (5.9%). This is pretty significant savings. Just imagine if all script and stylesheet transfer sizes were reduced by 6% – that would reduce the number of roundtrips as well as reducing mobile data usage.
Previously I mentioned that Zopfli has compression times that are “2 to 3 orders of magnitude more than zlib at maximum quality”. When running Zopfli from the commandline it seemed instantaneous – so my fears of it taking seconds or perhaps minutes were unfounded. Of course, this is based on my perception – the actual CPU time consumed by the web server at scale could be significant. That’s why it’s suggested that Zopfli might be best suited for static content. A great example is Google Hosted Libraries.
Of course, anyone using Zopfli out-of-band has to configure their server to use these pre-generated files for compressed responses. So far I haven’t seen examples of how to do this. Ilya pointed out that AWS and Google Cloud Storage work this way already – where you have to upload uncompressed as well as compressed versions of web resources. Zopfli fits well if you use those services. For people who currently do compression on the fly (e.g., using mod_deflate) the change to use Zopfli is likely to outweigh the benefits as long as Zopfli can’t be run in realtime.
I hope to see adoption of Zopfli where it makes sense. If you use Zopfli to compress your resources, please add a comment with feedback on your experience.