Storager case study: Bing, Google

March 28, 2011 9:26 pm | 23 Comments

Storager

Last week I posted my mobile comparison of 11 top sites. One benefit of analyzing top websites is finding new best practices. In that survey I found that the mobile version of Bing used localStorage to reduce the size of their HTML document from ~200 kB to ~30 kB. This is a good best practice in general and makes even more sense on mobile devices where latencies are higher, caches are smaller, and localStorage is widely supported.

I wanted to further explore Bing’s use of localStorage for better performance. One impediment is that there’s no visibility into localStorage on a mobile device. So I created a new bookmarklet, Storager, and added it to the Mobile Perf uber bookmarklet. (In other words, just install Mobile Perf – it bundles Storager and other mobile bookmarklets.)

Storager lets you view, edit, clear, and save localStorage for any web page on any browser – including mobile. Viewing localStorage on a 320×480 screen isn’t ideal, so I did the obvious next step and integrated Storager with Jdrop. With these pieces in place I’m ready to analyze how Bing uses localStorage.

Bing localStorage

My investigation begins by loading Bing on my mobile device – after the usual redirects I end up at the URL http://m.bing.com/?mid=10006. Opening Storager from the Mobile Perf bookmarklet I see that localStorage has ~10 entries. Since I’m not sure when these were written to localStorage I clear localStorage (using Storager) and hit reload. Opening Storager again I see the same ~10 entries and save those to Jdrop. I show the truncated entries below. I made the results public so you can also view the Storager results in Jdrop.

BGINFO: {"PortraitLink":"http://www.bing.com/fd/hpk2/Legzira_EN-US262...
CApp.Home.FD66E1A3: #ContentBody{position:relative;overflow:hidden;height:100%;-w...
CUX.Keyframes.B8625FE...: @-webkit-keyframes scaleout{from{-webkit-transform:scale3d(1,...
CUX.Site.18BDD936: *{margin:0;padding:0}table{border-collapse:separate;border-sp...
CUX.SiteLowRes.C8A1DA...: .blogoN{background-image:url(data:image/png;base64,iVBORw0KGg...
JApp.Home.DE384EBF: (function(){function a(){Type.registerNamespace("SS");SS.Home...
JUX.Compat.0907AAD4: function $(a){return document.getElementById(a)}var FireEvent...
JUX.FrameworkCore.A39...: (function(){function a(){Type.registerNamespace("BM");AjaxSta...
JUX.MsCorlib.172D90C3: window.ss={version:"0.6.1.0",isUndefined:function(a){return a...
JUX.PublicJson.540180...: if(!this.JSON)this.JSON={};(function(){function c(a){return a...
JUX.UXBaseControls.25...: (function(){function a(){Type.registerNamespace("UXControls")...
RMSM.Keys: CUX.Site.18BDD936~CUX.Keyframes.B8625FEE~CApp.Home.FD66E1A3~C...

These entries are written to localStorage as part of downloading the Bing search page. These entries add up to ~170 kB in size (uncompressed). This would explain the large size of the Bing HTML document on mobile. We can verify that these keys are downloaded via the HTML document by searching for a unique string from the data such as “FD66E1A3″. We find this string in the Bing document source (saved in Jdrop) as the id of a STYLE block:

<style data-rms="done" id="CApp.Home.FD66E1A3" rel="stylesheet" type="text/css">
#ContentBody{position:relative;overflow:hidden;height:100%;-webkit-tap-highlight-color:...

Notice how the content of this STYLE block matches the data in localStorage. The other localStorage entries also correspond to SCRIPT and STYLE blocks in the initial HTML document. Bing writes these blocks to localStorage and then on subsequent page views reads them back and inserts them into the document resulting in a much smaller HTML document download size. The Bing server knows which blocks are in the client’s localStorage via a cookie, where the cookie is comprised of the localStorage keys delimited by “~”:

RMSM=JApp.Home.DE384EBF~JUX.UXBaseControls.252CB7BF~JUX.FrameworkCore.A39F6425~
JUX.PublicJson.540180A4~JUX.Compat.0907AAD4~JUX.MsCorlib.172D90C3~CUX.SiteLowRes.C8A1DA4E~
CApp.Home.FD66E1A3~CUX.Keyframes.B8625FEE~CUX.Site.18BDD936~;

Just to be clear, everything above happens during the loading of the blank Bing search page. Once a query is issued the search results page downloads more keys (~95 kB additional data) and expands the cookie with the new key names.

Google localStorage

Another surprise from last week’s survey was that the mobile version of Google Search had 68 images in the results HTML document as data: URIs, compared to only 10 for desktop and iPad. Mobile browsers open fewer TCP connections and these connections are typically slower compared to desktop, so reducing the number of HTTP requests is important.

The additional size from inlining data: URIs doesn’t account for the large size of the Google Search results page, so perhaps localStorage is being seeded here, too. Using Storager we see over 130 entries in localStorage after a search for flowers. Here’s a sample. (As before, the key names and values may be truncated.)

 mres.-8Y5Dw_nSfQztyYx: <style>a{color:#11c}a:visited{color:#551a8b}body{margin:0;pad...
 mres.-Kx7q38gfNkQMtpx: <script> //<![CDATA[ var Zn={},bo=function(a,b){b&&Zn[b]||(ne...
 mres.0kH3gDiUpLA5DKWN: <style>.zl9fhd{padding:5px 0 0}.sc59bg{clear:both}.pyp56b{tex...
 mres.0thHLIQNAKnhcwR4: <style>.fdwkxt{width:49px;height:9px;background:url("data:ima...
 mres.36ZFOahhhEK4t3WE: <script> //<![CDATA[ var kk,U,lk;(function(){var a={};U=funct...
 mres.3lEpts5kTxnI2I5S: <script> //<![CDATA[ var Ec,Fc,Gc=function(a){this.Jl=a},Hc="...
 mres.4fbdvu9mdAaBINjE: <script> //<![CDATA[ u("_clOnSbt",function(){var a=document.g...
 mres.5QIb-AahnDgEGlYP: <script> //<![CDATA[ var cb=function(a){this.Cc=a},db=/\s*;\s...
 mres:time.-8Y5Dw_nSfQ...: 1301368541872
 mres:time.-Kx7q38gfNk...: 1301368542755
 mres:time.0kH3gDiUpLA...: 1301368542257
 mres:time.0thHLIQNAKn...: 1301368542223
 mres:time.36ZFOahhhEK...: 1301368542635
 mres:time.3lEpts5kTxn...: 1301368542579
 mres:time.4fbdvu9mdAa...: 1301368542720
 mres:time.5QIb-AahnDg...: 1301368542856

Searching the search results docsource for a unique key such as “8Y5D” we find:

<style id="r:-8Y5Dw_nSfQztyYx" type="text/css">
a{color:#11c}a:visited{color:#551a8b}body{margin:0;padding:0}...

Again we see that multiple SCRIPT and STYLE blocks are being saved to localStorage totaling 154 kB. On subsequent searches the HTML document size drops from the initial size of 220 kB uncompressed (74 kB compressed) to 67 kB uncompressed (16 kB compressed). In addition to the key names being saved in a cookie, it appears that an epoch time (in milliseconds) is associated with each key.

Conclusion

Bing and Google Search make extensive use of localStorage for stashing SCRIPT and STYLE blocks that are used on subsequent page views. None of the other top sites from my previous post use localStorage in this way. Are Bing and Google Search onto something? Yes, definitely. As I pointed out in my previous post, this is another example of a performance best practice that is used on a top mobile site but is not in the recommendations from Page Speed or YSlow. Many of the performance best practices that I’ve evangelized over the last six years for desktop apply to mobile, but I believe there are specific mobile best practices that we’re just beginning to identify. I’ve started using “High Performance Mobile” as the title of future presentations. Another book? hmmm….

23 Comments

Velocity: Forcing Gzip Compression

July 12, 2010 6:57 pm | 25 Comments

Tony Gentilcore was my officemate when I first started at Google. I was proud of my heritage as “the YSlow guy”. After all, YSlow was well over 1M downloads. After a few days I found out that Tony was the creator of Fasterfox – topping 11M downloads. Needless to say, we hit it off and had a great year brainstorming techniques for optimizing web performance.

During that time, Tony was working with the Google Search team and discovered something interesting: ~15% of users with gzip-capable browsers were not sending an appropriate Accept-Encoding request header. As a result, they were sent uncompressed responses that were 3x bigger resulting in slower page load times. After some investigation, Tony discovered that intermediaries (proxies and anti-virus software) were stripping or munging the Accept-Encoding header. My blog post Who’s not getting gzip? summarizes the work with links to more information. Read Tony’s chapter in Even Faster Web Sites for all the details.

Tony is now working on Chrome, but the discovery he made has fueled the work of Andy Martone and others on the Google Search team to see if they could improve page load times for users who weren’t getting compressed responses. They had an idea:

For requests with missing or mangled Accept-Encoding headers, inspect the User-Agent to identify browsers that should understand gzip.
Test their ability to decompress gzip.
If successful, send them gzipped content!

This is a valid strategy given that the HTTP spec says that, in the absence of an Accept-Encoding header, the server may send a different content encoding based on additional information (such as the encodings known to be supported by the particular client).

During his presentation at Velocity, Forcing Gzip Compression, Andy describes how Google Search implemented this technique:

  • At the bottom of a page, inject JavaScript to:
    • Check for a cookie.
    • If absent, set a session cookie saying “compression NOT ok”.
    • Write out an iframe element to the page.
  • The browser then makes a request for the iframe contents.
  • The server responds with an HTML document that is always compressed.
  • If the browser understands the compressed response, it executes the inlined JavaScript and sets the session cookie to “compression ok”.
  • On subsequent requests, if the server sees the “compression ok” cookie it can send compressed responses.

The savings are significant. An average Google Search results page is 34 kB, which compresses down to 10 kB. The ability to send a compressed response cuts page load times by ~15% for these affected users.

Andy’s slides contain more details about how to only run the test once, recommended cookie lifetimes, and details on serving the iframe. Since this discovery I’ve talked to folks at other web sites that confirm these mysterious requests that are missing an Accept-Encoding header. Check it out on your web site – 15% is a significant slice of users! If you’d like to improve their page load times, take Andy’s advice and send them a compressed response that is smaller and faster.

Belorussian translation

25 Comments

Diffable: only download the deltas

July 9, 2010 9:31 am | 15 Comments

There were many new products and projects announced at Velocity, but one that I just found out about is Diffable. It’s ironic that I missed this one given that it happened at Velocity and is from Google. The announcement was made during a whiteboard talk, so it didn’t get much attention. If your web site has large JavaScript downloads you’ll want to learn more about this performance optimization technique.

The Diffable open source project has plenty of information, including the Diffable slides used by Josh Harrison and James deBoer at Velocity. As explained in the slides, Diffable uses differential compression to reduce the size of JavaScript downloads. It makes a lot of sense. Suppose your web site has a large external script. When a new release comes out, it’s often the case that a bulk of that large script is unchanged. And yet, users have to download the entire new script even if the old script is still cached.

Josh and James work on Google Maps which has a main script that is ~300K. A typical revision for this 300K script produces patches that are less than 20K. It’s wasteful to download that other 280K if the user has the old revision in their cache. That’s the inspiration for Diffable.

Diffable is implemented on the server and the client. The server component records revision deltas so it can return a patch to bring older versions up to date. The client component (written in JavaScript) detects if an older version is cached and if necessary requests the patch to the current version. The client component knows how to merge the patch with the cached version and evals the result.

The savings are significant. Using Diffable has reduced page load times in Google Maps by more than 1200 milliseconds (~25%). Note that this benefit only affects users that have an older version of the script in cache. For Google Maps that’s 20-25% of users.

In this post I’ve used scripts as the example, but Diffable works with other resources including stylesheets and HTML. The biggest benefit is with scripts because of their notorious blocking behavior. The Diffable slides contain more information including how JSON is used as the delta format, stats that show there’s no performance hit for using eval, and how Diffable also causes the page to be enabled sooner due to faster JavaScript execution. Give it a look.

15 Comments

Velocity: Google Maps API performance

July 7, 2010 1:09 pm | 1 Comment

Several months ago I saw Susannah Raub do an internal tech talk on the performance improvements behind Google Maps API v3. She kindly agreed to reprise the talk at Velocity. Luckily it was videotaped, and the slides (ODP) are available, too. It’s a strong case study on improving performance, is valuable for developers working with the Google Maps API, and has a few takeaways that I’ll blog about more soon.

Susannah starts off bravely by showing how Google Maps API v2 takes 17 seconds to load on an iPhone. This was the motivation for the work on v3 – to improve performance. In order to improve performance you have to start by measuring it. The Google Maps team broke down “performance” into three categories:

  • user perceived latency – how long it takes for the page to appear usable, in this case for the map to be rendered
  • page ready time - how long it takes for the page to become usable, e.g. for the map to be draggable
  • page load time - how long it takes for all the elements to be present, in the case of maps this includes all of the map controls to be loaded and working

The team wanted to measure all of these areas. It’s fairly easy to find tools to measure performance on the desktop – the Google Maps teamed used HttpWatch. Performance tools, or any development tools for that matter, are harder to come by in the mobile space. But the team especially wanted to focus on creating a fast experience on mobile devices. They ended up using Fiddler as a proxy to gain visibility into the page’s performance profile.

future blog post #1: Coincidentally, today I saw a tweet about Craig Dunn’s instructions for Monitoring iPhone web traffic (with Fiddler). This is a huge takeaway for anyone doing web development for mobile. At Velocity, Eric Lawrence (creator of Fiddler) announced Fiddler support for the HTTP Archive Specification. The HTTP Archive (HAR) format is a specification I initiated over a year ago with folks from HttpWatch and Firebug. HAR is becoming the industry standard just as I had hoped and is now supported in numerous developer tools. I wrote one such tool, called HAR to Page Speed, that takes a HAR file and displays a Page Speed performance analysis as well as an HTTP waterfall chart. Putting all these pieces together, you can now load a web site on your iPhone, monitor it with Fiddler, export it to a HAR file, and upload it to HAR to Page Speed to find out how it performs. Given Fiddler’s extensive capabilities for creating addons, I expect it won’t be long before all of this is built into Fiddler itself.

In the case of Google Maps API, the long pole in the tent was main.js. They have a small (15K) bootstrap script that loads main.js (180K). (All of the script sizes in this blog post are UNcompressed sizes.) The performance impact of main.js was especially bad on mobile devices because of less caching. They compiled their JavaScript (using Closure) and combined three HTTP requests into one.

future blog post #2: The team also realized that although their JavaScript download was large, the revisions between releases was small. They created a framework for only downloading deltas when possible that cut seconds off their download times. More on this tomorrow.

These performance improvements helped, but they wanted to go further. They redesigned their code using an MVC architecture. As a result, the initial download only needs to include the models, which are small. The larger views and controllers that do all the heavy lifting are loaded asynchronously. This reduced the initial bootstrap script from 15K to 4K, and the main.js from 180K to 33K.

The results speak for themselves. Susannah concludes by showing how v3 of Google Maps API takes only 5 seconds to load on the iPhone, compared to v2′s 17 seconds. The best practices the team employed for making Google Maps faster are valuable for anyone working on JavaScript-heavy web sites. Take a look at the video and slides, and watch here for a follow-up on Fiddler for iPhone and loading JavaScript deltas.

1 Comment

Google adds site speed to search ranking

April 9, 2010 7:46 am | 12 Comments

Today, Google announced that a site’s speed has been added as a signal to Google’s search ranking algorithm: Using site speed in web search ranking.

In March 2008, one month after I started working here, Google announced that site speed was being incorporated into Adwords Quality Score. When I wrote my blog post about that change to Adwords (Google fosters a faster Internet) I had no idea that this was the beginning of a long series of contributions from Google for creating a faster Web. Since that time Google has released:

Two years ago when I talked to people about the Adwords change, most people thought it was a good idea, but the most frequent response was, “Doesn’t this favor larger companies that care about performance?” In my experience, small companies that care about performance are able to make improvements much more quickly than large companies. Small companies are typically more agile and have less legacy code to worry about.

I’m excited to see web performance optimization become a competitive advantage, and look forward to helping web developers around the world make their sites even faster. Make sure to run Page Speed and YSlow to find the most important performance improvements. If you still have questions, feel free to contact me. I’ll be happy to analyze your web site and give you some tips.

As much as I’m excited about how Google’s announcement raises awareness about web performance optimization among companies and developers, I’m most excited about what this means for users. Faster web sites lead to a better user experience. And that’s what it’s all about.

12 Comments

P3PC: Google AdSense

March 29, 2010 1:44 pm | 6 Comments

P3PC is a project to review the performance of 3rd party content such as ads, widgets, and analytics. You can see all the reviews and stats on the P3PC home page. This blog post looks at Google AdSense. Here are the summary stats:

impact on page Page Speed YSlow doc.
write
total reqs total xfer size JS ungzip DOM elems median Δ load time
big 87 84 y 8 41 kB 76 kB 9 222 ms
column definitions
Click here to see how your browser performs compared to the median load time shown above.

After signing up for Google AdSense, you can setup different types of ads. I chose “AdSense for Content” (listed first). Here’s what an example ad looks like. (This is a static image. Go to the Compare page to see the snippet live.)

Snippet Code

Let’s look at the actual snippet code:

1: <script type=”text/javascript”><!–
2: google_ad_client = “pub-0478442537074871″;
3: /* 300×250, created 3/6/10 */
4: google_ad_slot = “4427977761″;
5: google_ad_width = 300;
6: google_ad_height = 250;
7: //–>
8: </script>
9: <script type=”text/javascript”
10: src=”http://pagead2.googlesyndication.com/pagead/show_ads.js”>
11: </script>
snippet code as of March 10, 2010

A quick walk through the snippet code:

  • lines 2-6 – Define global variables that are used by the show_ads.js script.
  • lines 9-11 – Load the show_ads.js script.

Performance Analysis

This HTTP waterfall chart was generated by WebPagetest.org using IE 7 with a 1.5Mbps connection from Dulles, VA. It reveals a lot of idiosyncrasies with scripts, browsers, and HTTP. Let’s step through each request.

  • item 1: compare.php – The HTML document.
  • item 2: show_ads.js – The main script. All other downloads are blocked because this is loaded using the SCRIPT SRC HTML tag.
  • items 3-5 – Other scripts that are loaded dynamically by show_ads.js. This dynamic loading is done by using document.write. Using document.write causes the scripts themselves to be downloaded in parallel in IE (as evidenced by the waterfall chart). However, subsequent resources are still blocked (item 7).
  • item 6: ads – The actual ad content from doubleclick.net. Google AdSense creates an iframe to hold the ad, so this is the HTML document contained in that iframe.
  • item 7: *-waterfall.png – The waterfall image in this page. This is the main content of the page. Notice how it’s blocked by the previous four scripts.
  • item 8-9: abg-en-100c-0000000.png – The “Ads by Google” image. This is loaded twice in IE because it’s referenced twice in the ad: as an IMG and as part of AlphaImageLoader.
  • item 10: sma8.js – A script loaded by the ad. Because the ad is in an iframe, this script won’t block any resources in the main page.

Now that we have a handle on the HTTP requests involved, let’s look at the most important performance issues along with recommended solutions.

1. The scripts block the main content of the page from loading.

It would be better to load the scripts without blocking. This isn’t possible with the current implementation because the ad is inserted using document.write. Calling document.write from scripts loaded asynchronously may lead to ads being inserted in the wrong location or potentially the entire page being blank. The ideal solution would be create a DIV with the desired width and height to hold the ad and load the scripts asynchronously, similar to what BuySellAds.com does.

2. Most of the resources are only cacheable for a day.

It’s understandable that show_ads.js is only cacheable for a day. If this script changed (bug fix, new feature), there would be no way to rev the filename (since the snippet is embedded in the publishers’ pages). A short expiration date ensures users will get the updated version sooner (within a day). However, expansion_embed.js, abg-en-100c-000000.png, and sma8.js are also only cacheable for a day. These should have a far future expiration date (a year or more). If there was a change to expansion_embed.js (for example), the new version could be pushed with a modified filename (expansion_embed.1.1.js) and the code in show_ads.js could be modified to reference this new filename.

3. abg-en-100c-000000.png is downloaded twice.

This is happening because an AlphaImageLoader filter is used to achieve alpha transparency in IE 6. The HTML looks like this:

1: <span style=”display:inline-block;height:16px;width:78;
2:     pxfilter:progid:DXImageTransform.Microsoft.AlphaImageLoader(
3:     src=’http://…/abg-en-100c-000000.png’);”>
4: <img src=http://…/abg-en-100c-000000.png
5:     alt=”Ads by Google” border=0 height=16 width=78
6:     style=filter:progid:DXImageTransform.Microsoft.Alpha(opacity=0)>
7: </span>

Notice that abg-en-100c-000000.png is used in the src in both lines 3 and 4. I’m not sure why the IMG is being used with an opacity of 0 (preserve space?), but I bet there’s a workaround. Also, this should only be necessary in IE 6, so the AlphaImageLoader should be skipped for IE 7&8.

4. Five scripts are downloaded.

Some of the scripts could be combined to reduce the number of HTTP requests.


6 Comments

Page Speed 1.6 Beta – new rules, native library

February 1, 2010 9:48 pm | 8 Comments

Page Speed 1.6 Beta was released today. There are a few big changes, but the most important fix is compatibility with Firefox 3.6. If you’re running the latest version of Firefox visit the download page to get Page Speed 1.6. Phew!

I wanted to highlight some of the new features mentioned in the 1.6 release notes: new rules and native library.

Three new rules were added as part of Page Speed 1.6:

  • Specify a character set early – If you don’t specify a character set for your web pages or specify it too low in the page, the browser could parse it incorrectly. You can specify a character set using the META tag or in the Content-Type response header. Returning charset in the Content-Type header will ensure the browser sees it early. (See this Zoompf post for more information.)
  • Minify HTML – Top performing web sites are already on top of this, right? Analyzing the Alexa U.S. top 10 shows an average savings of 8% if they minified their HTML. You can easily check your site with this new rule, and even save the optimized version.
  • Minimize Request Size – Okay, this is cool and shows how Google tries to squeeze out every last drop of performance. This rule sees if the total size of the request headers exceed one packet (~1500 bytes). Requiring a roundtrip just to submit the request hurts performance, especially for users with high latency.

The other big feature I wanted to highlight first came out in Page Speed 1.5 but didn’t get much attention – the Page Speed C++ Native Library. It probably didn’t get much attention because it’s one of those changes that, if done correctly, no one notices. The work behind the native library involves porting the rules from JavaScript to C++. Why bother? Here’s what the release notes say:

This should speed up scoring, as well as allow rules to be run in programs other than just the Page Speed Firefox extension.

Making Page Speed run faster is great, but the idea of implementing the performance logic in a C++ library so the rules can be run in other programs is very cool. And where have we seen this recently? In the Site Performance section recently added to Webmaster Tools. Now we have a server-side tool that produces the same recommendations found from running the Page Speed add-on. Here are the rules that have been ported to the native library:

added in 1.5:

  • Combine external JavaScript
  • Combine external CSS
  • Enable gzip compression
  • Optimize images
  • Minimize redirects
  • Minimize DNS lookups
  • Avoid bad requests
  • Serve resources from a consistent URL
added in 1.6:

  • specify charset early
  • Minify HTML
  • Minimize request size
  • Put CSS in the document head
  • Minify CSS
  • Optimize the order of styles and scripts
  • serve scaled images
  • specify image dimensions

Webmaster Tools Site Performance today shows recommendations based on the rules in native library 1.5. Now that more rules have been added to native library 1.6, webmasters can expect to see those recommendations in the near future. But this integration shouldn’t stop with Webmaster Tools. I’d love to see other tools and services integrate native library. If you’re interested in using native library, check out the page-speed project on Google Code and contact the page-speed-discuss Google Group.

8 Comments

jQuery 1.4 performance

January 15, 2010 11:32 am | 5 Comments

JQuery 1.4 was released yesterday. I lifted the text from the release announcement, removed stop words, converted to lowercase, and found the ten most used words:

  1. jquery (71)
  2. function (27)
  3. performance (23)
  4. object (20)
  5. events (19)
  6. element (15)
  7. ajax (15)
  8. dom (13)
  9. json (12)
  10. request (10)

That’s right, “performance” comes in third ahead of “object”, “element”, and even “dom”. Anyone think jQuery 1.4 had a focus on performance? Here’s what John Resig says.

Performance Overhaul of Popular Methods

Many of the most popular and commonly used jQuery methods have seen a significant rewrite in jQuery 1.4. When analyzing the code base we found that we were able to make some significant performance gains by comparing jQuery against itself: Seeing how many internal function calls were being made and to work to reduce the complexity of the code base.

He includes this chart that shows the reduction of complexity for some popular functions.

Of course, all of this is music to my ears. There was one other specific note that caught my eye in this commit comment:

Switched from using YUI Compressor to Google Compiler. Minified and Gzipped filesize reduced to 22,839 bytes from 26,169 bytes (13% decrease in filesize).

Minifying JavaScript is one of the rules I wrote about in High Performance Web Sites. Back then (2006-2007), the best tool for minifying was JSMin from Doug Crockford. It still might be the best tool today for minifying in realtime (e.g., dynamic Ajax and JSON responses). For minifying static files, YUI Compressor (released in late 2007) does a better job. It also works on CSS. So this move from YUI Compressor to the Google Closure Compiler by John Resig, someone who obviously cares about performance, is a big deal.

For jQuery 1.4, the savings from switching to Compiler was 13%. If you have done comparisons with your code, please add your stats via a comment below.

My last blog post (Stuck inside Classic Rock) got pretty esoteric at the end when I started talking about Quality, and I promised a follow-up post on how that related to web performance. I’m still working on that post, but am happy to take this digression. But is it a digression? I’ve been talking to folks over the past week about how they strive for and compromise on quality in their jobs. We all compromise on quality to a certain degree. But occasionally, a person is afforded the opportunity to dedicate a significant portion of their life to a single-minded purpose, and can reach levels of Quality that standout in comparison. John Resig has achieved that. Congratulations to John and the jQuery team. Keep up the good (high performance) work!

5 Comments

Speed Tracer – visibility into the browser

December 10, 2009 7:46 am | 9 Comments

Is it just me, or does anyone else think Google’s on fire lately, lighting up the world of web performance? Quick review of news from the past two weeks:

Speed Tracer was my highlight from last night’s Google Campfire One. The event celebrated the release of GWT 2.0. Performance and “faster” were emphasized again and again throughout the evening’s presentations (I love that). GWT’s new code splitting capabilities are great for performance, but Speed Tracer easily wowed the audience – including me. In this post, I’ll describe what I like about Speed Tracer, what I hope to see added next, and then I’ll step back and talk about the state of performance profilers.

Getting started with Speed Tracer

Some quick notes about Speed Tracer:

  • It’s a Chrome extension, so it only runs in Chrome. (Chrome extensions is yet another announcement this week.)
  • It’s written in GWT 2.0.
  • It works on all web sites, even sites that don’t use GWT.

The Speed Tracer getting started page provides the details for installation. You have to be on the Chrome dev channel. Installing Speed Tracer adds a green stopwatch to the toolbar. Clicking on the icon starts Speed Tracer in a separate Chrome window. As you surf sites in the original window, the performance information is shown in the Speed Tracer window.

Beautiful visibility

When it comes to optimizing performance, developers have long been working in the dark. Without the ability to measure JavaScript execution, page layout, reflows, and HTML parsing, it’s not possible to optimize the pain points of today’s web apps. Speed Tracer gives developers visibility into these parts of page loading via the Sluggishness view, as shown here. (Click on the figure to see a full screen view.) Not only is this kind of visibility great, but the display is just, well, beautiful. Good UI and dev tools don’t often intersect, but when they do it makes development that much easier and more enjoyable.

Speed Tracer also has a Network view, with the requisite waterfall chart of HTTP requests. Performance hints are built into the tool flagging issues such as bad cache headers, exceedingly long responses, Mozilla cache hash collision, too many reflows, and uncompressed responses. Speed Tracer also supports saving and reloading the profiled information. This is extremely useful when working on bugs or analyzing performance with other team members.

Feature requests

I’m definitely going to be using Speed Tracer. For a first version, it’s extremely feature rich and robust. There are a few enhancements that will make it even stronger:

  • overall pie chart – The “breakdown by time” for phases like script evaluation and layout are available for segments within a page load. As a starting point, I’d like to see the breakdown for the entire page. When drilling down on a specific load segment, this detail is great. But having overall stats will give developers a clue where they should focus most of their attention.
  • network timing – Similar to the issues I discovered in Firebug Net Panel, long-executing JavaScript in the main page blocks the network monitor from accurately measuring the duration of HTTP requests. This will likely require changes to WebKit to record event times in the events themselves, as was done in the fix for Firefox.
  • .HAR support – Being able to save Speed Tracer’s data to file and share it is great. Recently, Firebug, HttpWatch, and DebugBar have all launched support for the HTTP Archive file format I helped create. The format is extensible, so I hope to see Speed Tracer support the .HAR file format soon. Being able to share performance information across tools and browsers is a necessary next step. That’s a good segue…

Developers need more

Three years ago, there was only one tool for profiling web pages: Firebug. Developers love working in Firefox, but sometimes you just have to profile in Internet Explorer. Luckily, over the last year we’ve seen some good profilers come out for IE including MSFast , AOL Pagetest, WebPagetest.org, and dynaTrace Ajax Edition. DynaTrace’s tool is the most recent addition, and has great visibility similar to Speed Tracer, as well as JavaScript debugging capabilities. There have been great enhancements to Web Inspector, and the Chrome team has built on top of that adding timeline and memory profiling to Chrome. And now Speed Tracer is out and bubbling to the top of the heap.

The obvious question is:

Which tool should a developer choose?

But the more important question is:

Why should a developer have to choose?

There are eight performance profilers listed here. None of them work in more than a single browser. I realize web developers are exceedingly intelligent and hardworking, but no one enjoys having to use two different tools for the same task. But that’s exactly what developers are being asked to do. To be a good developer, you have to be profiling your web site in multiple browsers. By definition, that means you have to install, learn, and update multiple tools. In addition, there are numerous quirks to keep in mind when going from one tool to another. And the features offered are not consistent across tools. It’s a real challenge to verify that your web app performs well across the major browsers. When pressed, rock star web developers I ask admit they only use one or two profilers – it’s just too hard to stay on top of a separate tool for each browser.

This week at Add-on-Con, Doug Crockford’s closing keynote is about the Future of the Web Browser. He’s assembled a panel of representatives from Chrome, Opera, Firefox, and IE. (Safari declined to attend.) My hope is they’ll discuss the need for a cross-browser extension model. There’s been progress in building protocols to support remote debugging: WebDebugProtocol and Crossfire in Firefox, Scope in Opera, and ChromeDevTools in Chrome. My hope for 2010 is that we see cross-browser convergence on standards for extensions and remote debugging, so that developers will have a slightly easier path for ensuring their apps are high performance on all browsers.

9 Comments

How browsers work

December 4, 2009 1:11 pm | 2 Comments

My initial work on the Web was on the backend – C++, Java, databases, Apache, etc. In 2005, I started focusing on web performance. To get a better idea of what made them slow, I surfed numerous web sites with a packet sniffer open. That’s when I discovered that a bulk of the time spent loading a web site occurs on the frontend, after the HTML document arrives at the browser.

Not knowing much about how the frontend worked, I spent a week searching for anything that could explain what was going on in the browser. The gem that I found was David Hyatt’s blog post entitled Testing Page Load Speed. His article opened my eyes to the complexity of what the browser does, and launched my foray into finding ways to optimize page load times resulting in things like YSlow and High Performance Web Sites.

Today’s post on the Chromium Blog (Technically speaking, what makes Google Chrome fast?), contains a similar gem. Mike Belshe, Chrome developer and co-creator of SPDY, talks about the performance optimizations inside of Chrome. But in so doing, he also reveals insights into how all browsers work and the challenges they face. For example, until I saw this, I didn’t have a real appreciation for the performance impact of DOM bindings – the connections between the JavaScript that modifies web pages and the C++ that implements the browser. He also talks about garbage collection, concurrent connections, lookahead parsing and downloading, domain sharding, and multiple processes.

Take 16.5 minutes and watch Mike’s video. It’s well worth it.

2 Comments