JavaScript Performance

January 13, 2012 10:09 pm | 19 Comments

Last night I spoke at the San Francisco JavaScript Meetup. I gave a brand new talk called JavaScript Performance that focuses on script loading and async snippets. The snippet example I chose was the Google Analytics async snippet. The script-loading part of that snippet is only six lines, but a lot of thought and testing went into it. It’s a great prototype to use if you’re creating your own async snippet. I’ll tweet if/when the video of my talk comes out, but in the meantime the slides (Slideshare, pptx) do a good job of relaying the information.

There are two new data points from the presentation that I want to call out in this blog post.

Impact of JavaScript

The presentation starts by suggesting that JavaScript is typically the #1 place to look for making a website faster. My anecdotal experience supports this hypothesis, but I wanted to try to do some quantitative verification. As often happens, I turned to WebPagetest.

I wanted to test the Alexa Top 100 URLs with and without JavaScript. To load these sites withOUT JavaScript I used WebPagetest’s “block” feature. I entered “.js” which tells WebPagetest to ignore every HTTP request with a URL that contains that string. Each website was loaded three times and the median page load time was recorded. I then found the median of all these median page load times.

The median page load with JavaScript is 3.65 seconds. Without JavaScript the page load time drops to 2.487 seconds – a 31% decrease. (Here’s the data in WebPagetest: with JavaScript, without JavaScript.) It’s not a perfect analysis: Some script URLs don’t contain “.js” and inline script blocks are still executed. I think this is a good approximation and I hope to do further experiments to corroborate this finding.

Async Execution Order & Onload

The other new infobyte has to do with the async=true line from the GA async snippet. The purpose of this line is to cause the ga.js script to not block other async scripts from being executed. It turns out that some browsers preserve the execution order of scripts loaded using the insertBefore technique, which is the technique used in the GA snippet:

var ga = document.createElement(‘script’);
ga.type = ‘text/javascript’;
ga.async = true;
ga.src = (‘https:’ == document.location.protocol ? ‘https://ssl’ : ‘http://www’) + ‘.google-analytics.com/ga.js’;
var s = document.getElementsByTagName(‘script’)[0];
s.parentNode.insertBefore(ga, s);

Preserving execution order of async scripts makes the page slower. If the first async script takes a long time to download, all the other async scripts are blocked from executing, even if they download sooner. Executing async scripts immediately as they’re downloaded results in a faster page load time. I knew old versions of Firefox had this issue, and setting async=true fixed the problem. But I wanted to see if any other browsers also preserved execution order of async scripts loaded this way, and whether setting async=true worked.

To answer these questions I created a Browserscope user test called Async Script Execution Order. I tweeted the test URL and got 348 results from 60+ different browsers. (Thanks to all the people that ran the test! I still need results from more mobile browsers so please run the test if you have a browser that’s not covered.) Here’s a snapshot of the results:

The second column shows the results of loading two async scripts with the insertBefore pattern AND setting async=true. The third column shows the results if async is NOT set to true. Green means the scripts execute immediately (good) and red indicates that execution order is preserved (bad).

The results show that Firefox 3.6, OmniWeb 622, and all versions of Opera preserve execution order. Setting async=true successfully makes the async scripts execute immediately in Firefox 3.6 and OmniWeb 622, but not in Opera. Although this fix only applies to a few browsers, its small cost makes it worthwhile. Also, if we get results for more mobile browsers I would expect to find a few more places where the fix is necessary.

I also tested whether these insertBefore-style async scripts block the onload event. The results, shown in the fourth column, are mixed if we include older browsers, but we see that newer browsers generally block the onload event when loading these async scripts – this is true in Android, Chrome, Firefox, iOS, Opera, Safari, and IE 10. This is useful to know if you wonder why you’re still seeing long page load times even after adopting async script loading. It also means that code in your onload handler can’t reliably assume async scripts are loaded because of the many browsers out there that do not block the onload event, including IE 6-9.

And a final shout out to the awesomeness of the Open Source community that makes tools like WebPagetest and Browserscope available – thanks Pat and Lindsey!

19 Comments

frontend SPOF survey

October 13, 2011 9:10 am | 3 Comments

Pat Meenan had a great blog post yesterday, Testing for Frontend SPOF. “SPOF” means single point of failure. I coined the term frontend SPOF to describe the all-too-likely situation where the HTML document returns successfully, but some other resource (a stylesheet, script, or font file) blocks the entire website from loading. This typically manifests itself as a blank white screen that the user stares out for 20 seconds or longer.

Frontend SPOF happens most frequently with third party content. If the HTML document returns successfully, then all the resources from the main website are likely to return successfully, as well. Third party content, however, isn’t controlled by the main website and thus could be suffering an outage or overload while the main website is working fine. As a result, the uptime of a website is no greater than the availability of the third party resources it uses that are in a position to cause frontend SPOF.

In my blog post of the same name I describe how Frontend SPOF happens and ways to avoid it, but I don’t provide a way for website owners to determine which third party resources may cause frontend SPOF. This is where Pat comes in. He’s created a public blackhole server: blackhole.webpagetest.org with the static IP address 72.66.115.13. Pointing your third party resources to this blackhole and reloading the page tells you if those resources cause frontend SPOF. Since Pat is the creator of WebPagetest.org, he has integrated this into the scripting capabilities of that tool so website owners can load their website and determine if any third party resources cause frontend SPOF.

/etc/hosts

I took a different approach outlined by Pat: I added the following lines to my /etc/hosts file (your location may vary) mapping these third party hostnames to point to the blackhole server:

72.66.115.13 apis.google.com
72.66.115.13 www.google-analytics.com
72.66.115.13 connect.facebook.net
72.66.115.13 platform.twitter.com
72.66.115.13 s7.addthis.com
72.66.115.13 l.addthiscdn.com
72.66.115.13 cf.addthis.com
72.66.115.13 api-public.addthis.com
72.66.115.13 widget.quantcast.com
72.66.115.13 ak.quantcast.com
72.66.115.13 assets.omniture.com
72.66.115.13 www.omniture.com
72.66.115.13 scripts.omniture.com
72.66.115.13 b.voicefive.com
72.66.115.13 ar.voicefive.com
72.66.115.13 c.statcounter.com
72.66.115.13 www.statcounter.com
72.66.115.13 www-beta.statcounter.com
72.66.115.13 js.revsci.net

After restarting my browser all requests to these hostnames will timeout. Pat’s blog post mentions 20 seconds for a timeout. He was running on Windows. I’m running on my Macbook where the timeout is 75 seconds! Any website that references third party content on these hostnames in a way that produces frontend SPOF will be blank for 75 seconds – an easy failure to spot.

survey says

THE GOOD: At this point I started loading the top 100 US websites. I was pleasantly surprised. None of the top 20 websites suffered from frontend SPOF. There were several that loaded third party content from these hostnames, but they had safeguarded themselves:

  • MSN makes a request to ar.voicefive.com, but does it asynchronously using a document.write technique.
  • AOL references platform.twitter.com, but puts the SCRIPT tag at the very bottom of the BODY so page rendering isn’t blocked.
  • IMDB uses the async version of Google Analytics, and puts the platform.twitter.com widget in an iframe.
  • LiveJournal goes above and beyond by wrapping the Google +1 and Facebook widgets in a homegrown async script loader.

THE BAD: Going through the top 100 I found five websites that had frontend SPOF:

  1. CNET loads http://platform.twitter.com/widgets.js in the HEAD as a blocking script.
  2. StumbleUpon loads http://connect.facebook.net/en_US/all.js at the top of BODY as a blocking script.
  3. NFL loads http://connect.facebook.net/en_US/all.js in the HEAD as a blocking script.
  4. Hulu, incredibly, loads Google Analytics in the HEAD as a blocking script. Please use the async snippet!
  5. Expedia loads http://connect.facebook.net/en_US/all.js as a blocking script in the middle of the page, so the right half of the page is blocked from rendering.

These results, although better than I expected, are still alarming. Although I only found five websites with frontend SPOF, that’s 5% of the overall sample. The percentage will likely grow as the sample size grows because best practices are more widely adopted by the top sites. Also, my list of third party hostnames is a small subset of all widgets and analytics available on the Web. And remember, I didn’t even look at ads.

Is it really worth blocking your site’s entire page for a widget button or analytics beacon – especially when workarounds exist? If you’re one of the five sites that faltered above, do yourself and your users a favor and find a way to avoid frontend SPOF. And if you’re outside the top 100, test your site using Pat’s blackhole server by editing /etc/hosts or following Pat’s instructions for testing frontend SPOF on WebPagetest.org.

3 Comments

ControlJS part 3: overriding document.write

December 15, 2010 11:18 pm | 20 Comments

This is the third of three blog posts about ControlJS – a JavaScript module for making scripts load faster. The three blog posts describe how ControlJS is used for async loading, delayed execution, and overriding document.write.

The goal of ControlJS is to give developers more control over how JavaScript is loaded. The key insight is to recognize that “loading” has two phases: download (fetching the bytes) and execution (including parsing). In ControlJS part 1 I focused on the download phase using ControlJS to download scripts asynchronously. ControlJS part 2 showed how delaying script execution makes pages load faster, especially for Ajax web apps that download a lot of JavaScript that’s not used immediately. In this post I look at how document.write causes issues when loading scripts asynchronously.

I like ControlJS, but I’ll admit now that I was unable to solve the document.write problem to my satisfaction. Nevertheless, I describe the dent I put in the problem of document.write and point out two alternatives that do a wonderful job of making async work in spite of document.write.

Async and document.write

ControlJS loads scripts asynchronously. By default these async scripts are executed after window onload. If one of these async scripts calls document.write the page is erased. That’s because calling document.write after the document is loaded automatically calls document.open. Anything written to the open document replaces the existing document content. Obviously, this is not the desired behavior.

I wanted ControlJS to work on all scripts in the page. Scripts for ads are notorious for using document.write, but I also found scripts created by website owners that use document.write. I didn’t want those scripts to be inadvertently loaded with ControlJS and have the page get wiped out.

The problem was made slightly easier for me because ControlJS is in control when each inline and external script is being executed. My solution is to override document.write and capture the output for each script one at a time. If there’s no document.write output then proceed to the next script. If there is output then I create an element (for now a SPAN), insert it into the DOM immediately before the SCRIPT tag that’s being executed, and set the SPAN’s innerHTML to the output from document.write.

This works pretty well. I downloaded a local copy of CNN.com and converted it to ControlJS. There were two places where document.write was used (including an ad) and both worked fine. But there are issues. One issue I encountered is that wrapping the output in a SPAN can break certain behavior like hover handlers. But a bigger issue is that browsers ignore “<script…” added via innerHTML, and one of the main ways that ads use document.write is to insert scripts. I have a weak solution to this that parses out the “<script…” HTML and creates a SCRIPT DOM element with the appropriate SRC. It works, but is not very robust.

ControlJS is still valuable even without a robust answer for document.write. You don’t have to convert every script in the page to use ControlJS. If you know an ad or widget uses document.write you can load those scripts the normal way. Or you can test and see how complex the document.write is – perhaps it’s a use cases that ControlJS handles.

Real solutions for async document.write

I can’t end this series of posts without highlighting two alternatives that have solved the asynchronous document.write issue: Aptimize and GhostWriter.

Aptimize sells a web accelerator that changes HTML markup on the fly to have better performance. In September they announced the capability to load scripts asynchronously – including ads. As far as I know this is the first product that offered this feature. I had lunch with Ed and Derek from Aptimize this week and found out more about their implementation. It sounds solid.

Another solution is GhostWriter from Digital Fulcrum. Will contacted me and we had a concall and a few email exchanges about his implementation that uses an HTML parser on the document.write output. You can try the code for free.

Wrapping it up

ControlJS has many features including some that aren’t found in other script loading modules:

  • downloads scripts asynchronously
  • handles both inline scripts and external scripts
  • delays script execution until after the page has rendered
  • allows for scripts to be downloaded and not executed
  • integrates with simple changes to HTML – no code changes
  • solves some document.write async use cases
  • control.js itself is loaded asynchronously

ControlJS is open source. I hope some of these features are borrowed and added to other script loaders. There’s still more features to implement (async stylesheets, better document.write handling) and ControlJS has had minimal testing. If you’d like to add features or fix bugs please take a look at the ControlJS project on Google Code and contact the group via the ControlJS discussion list.

20 Comments

ControlJS part 2: delayed execution

December 15, 2010 11:11 pm | 15 Comments

This is the second of three blog posts about ControlJS – a JavaScript module for making scripts load faster. The three blog posts describe how ControlJS is used for async loading, delayed execution, and overriding document.write.

The goal of ControlJS is to give developers more control over how JavaScript is loaded. The key insight is to recognize that “loading” has two phases: download (fetching the bytes) and execution (including parsing). In ControlJS part 1 I focused on the download phase using ControlJS to download scripts asynchronously. In this post I focus on the benefits ControlJS brings to the execution phase of script loading.

The issue with execution

The issue with the execution phase is that while the browser is parsing and executing JavaScript it can’t do much else. From a performance perspective this means the browser UI is unresponsive, page rendering stops, and the browser won’t start any new downloads.

The execution phase can take more time than you might think. I don’t have hard numbers on how much time is spent in this phase (although it wouldn’t be that hard to collect this data with dynaTrace Ajax Edition or Speed Tracer). If you do anything computationally intensive with a lot of DOM interaction the blocking effects can be noticeable. If you have a large amount of JavaScript just parsing the code can take a significant amount of time.

If all the JavaScript was used immediately to render the page we might have to throw up our hands and endure these delays, but it turns out that a lot of the JavaScript downloaded isn’t used right away. Across the Alexa US Top 10 only 29% of the downloaded JavaScript is called before the window load event. (I surveyed this using Page Speed‘s “defer JavaScript” feature.) The other 71% of the code is parsed and executed even though it’s not used for rendering the initial page. For these pages an average of 229 kB of JavaScript is downloaded. That 229 kB is compressed – the uncompressed size is upwards of 500 kB. That’s a lot of JavaScript. Rather than parse and execute 71% of the JavaScript in the middle of page loading, it would be better to avoid that pain until after the page is done rendering. But how can a developer do that?

Loading feature code

To ease our discussion let’s call that 29% of code used to render the page immediate-code and we’ll call the other 71% feature-code. The feature-code is typically for DHTML and Ajax features such as drop down menus, popup dialogs, and friend requests where the code is only executed if the user exercises the feature. (That’s why it’s not showing up in the list of functions called before window onload.)

Let’s assume you’ve successfully split your JavaScript into immediate-code and feature-code. The immediate-code scripts are loaded as part of the initial page loading process using ControlJS’s async loading capabilities. The additional scripts that contain the feature-code could also be loaded this way during the initial page load process, but then the browser is going to lock up parsing and executing code that’s not immediately needed. So we don’t want to load the feature-code during the initial page loading process.

Another approach would be to load the feature-code after the initial page is fully rendered. The scripts could be loaded in the background as part of an onload handler. But even though the feature-code scripts are downloaded in the background, the browser still locks up when it parses and executes them. If the user tries to interact with the page the UI is unresponsive. This unnecessary delay is painful enough that the Gmail mobile team went out of their way to avoid it. To make matters worse, the user has no idea why this is happening since they didn’t do anything to cause the lock up. (We kicked it off “in the background”.)

The solution is to parse and execute this feature-code when the user needs it. For example, when the user first clicks on the drop down menu is when we parse and execute menu.js. If the user never uses the drop down menu, then we avoid the cost of parsing and executing that code altogether. But when the user clicks on that menu we don’t want her to wait for the bytes to be downloaded – that would take too long especially on mobile. The best solution is to download the script in the background but delay execution until later when the code is actually needed.

Download now, Execute later

After that lengthy setup we’re ready to look at the delayed execution capabilities of ControlJS. I created an example that has a drop down menu containing the links to the examples.

In order to illustrate the pain from lengthy script execution I added a 2 second delay to the menu code (a while loop that doesn’t relinquish until 2 seconds have passed). Menu withOUT ControlJS is the baseline case. The page takes a long time to render because it’s blocked while the scripts are downloaded and also during the 2 seconds of script execution.

On the other hand, Menu WITH ControlJS renders much more quickly. The scripts are downloaded asynchronously. Since these scripts are for an optional feature we want to avoid executing them until later. This is achieved by using the “data-cjsexec=false” attribute, in addition to the other ControlJS modifications to the SCRIPT’s TYPE and SRC attributes:

<script data-cjsexec=false type="text/cjs" data-cjssrc="jquery.min.js"></script>
<script data-cjsexec=false type="text/cjs" data-cjssrc="fg.menu.js"></script>

The “data-cjsexec=false” setting means that the scripts are downloaded and stored in the cache, but they’re not executed. The execution is triggered later if and when the user exercises the feature. In this case, clicking on the Examples button is the trigger:

examplesbtn.onclick = function() {
   CJS.execScript("jquery.min.js");
   CJS.execScript("fg.menu.js", createExamplesMenu);
};

CJS.execScript() creates a SCRIPT element with the specified SRC and inserts it into the DOM. The menu creation function, createExamplesMenu, is passed in as the onload callback function for the last script. The 2 second delay is therefore incurred the first time the user clicks on the menu, but it’s tied to a the user’s action, there’s no delay to download the script, and the execution delay only occurs once.

The ability to separate and control the download phase and the execution phase during script loading is a key differentiator of ControlJS. Many websites won’t need this option. But websites that have a lot of code that’s not used to render the initial page, such as Ajax web apps, will benefit from avoiding the pain of script parsing and execution until it’s absolutely necessary.

15 Comments

ControlJS part 1: async loading

December 15, 2010 10:45 pm | 30 Comments

This is the first of three blog posts about ControlJS – a JavaScript module for making scripts load faster. The three blog posts describe how ControlJS is used for async loading, delayed execution, and overriding document.write.

The #1 performance best practice I’ve been evangelizing over the past year is progressive enhancement: deliver the page as HTML so it renders quickly, then enhance it with JavaScript. There are too many pages that are blank while several hundred kB of JavaScript is downloaded, parsed, and executed so that the page can be created using the DOM.

It’s easy to evangelize progressive enhancement – it’s much harder to actually implement it. That several hundred kB of JavaScript has to be unraveled and reorganized. All the logic that created DOM elements in the browser via JavaScript has to be migrated to run on the server to generate HTML. Even with new server-side JavaScript capabilities this is a major redesign effort.

I keep going back to Opera’s Delayed Script Execution option. Just by enabling this configuration option JavaScript is moved aside so that page rendering can come first. And the feature works – I can’t find a single website the suffers any errors with this turned on. While I continue to encourage other browser vendors to implement this feature, I want a way for developers to get this behavior sooner rather than later.

A number of web accelerators (like Aptimize, Strangeloop, FastSoft, CloudFlare, Torbit, and more recently mod_pagespeed) have emerged over the last year or so. They modify the HTML markup to inject performance best practices into the page. Thinking about this model I considered ways that markup could be changed to more easily delay the impact of JavaScript on progressive rendering.

The result is a JavaScript module I call ControlJS.

Controlling download and execution

The goal of ControlJS is to give developers more control over how JavaScript is loaded. The key insight is to recognize that “loading” has two phases: download (fetching the bytes) and execution (including parsing). These two phases need to be separated and controlled for best results.

Controlling how scripts are downloaded is a popular practice among performance-minded developers. The issue is that when scripts are loaded the normal way (<script src=""...) they block other resource downloads (lesser so in newer browsers) and also block page rendering (in all browsers). Using asynchronous script loading techniques mitigates these issues to a large degree. I wrote about several asynchronous loading techniques in Even Faster Web Sites. LABjs and HeadJS are JavaScript modules that provide wrappers for async loading. While these async techniques address the blocking issues that occur during the script download phase, they don’t address what happens during the script execution phase.

Page rendering and new resource downloads are blocked during script execution. With the amount of JavaScript on today’s web pages ever increasing, the blocking that occurs during script execution can be significant, especially for mobile devices. In fact, the Gmail mobile team thought script execution was such a problem they implemented a new async technique that downloads JavaScript wrapped in comments. This allows them to separate the download phase from the execution phase. The JavaScript is downloaded so that it resides on the client (in the browser cache) but since it’s a comment there’s no blocking from execution. Later, when the user invokes the associated features, the JavaScript is executed by removing the comment and evaluating the code.

Stoyan Stefanov, a former teammate and awesome performance wrangler, recently blogged about preloading JavaScript without execution. His approach is to download the script as either an IMAGE or an OBJECT element (depending on the browser). Assuming the proper caching headers exist the response is cached on the client and can later be inserted as a SCRIPT element. This is the technique used in ControlJS.

ControlJS: how to use it

To use ControlJS you need to make three modifications to your page.

Modification #1: add control.js

I think it’s ironic that JavaScript modules for loading scripts asynchronously have to be loaded in a blocking manner. From the beginning I wanted to make sure that ControlJS itself could be loaded asynchronously. Here’s the snippet for doing that:

var cjsscript = document.createElement('script');
cjsscript.src = "control.js";
var cjssib = document.getElementsByTagName('script')[0];
cjssib.parentNode.insertBefore(cjsscript, cjssib);

Modification #2: change external scripts

The next step is to transform all of the old style external scripts to load the ControlJS way. The normal style looks like this:

<script type="text/javascript" src="main.js"><script>

The SCRIPT element’s TYPE attribute needs to be changed to “text/cjs” and the SRC attribute needs to be changed to DATA-CJSSRC, like this:

<script type="text/cjs" data-cjssrc="main.js"><script>

Modification #3: change inline scripts

Most pages have inline scripts in addition to external scripts. These scripts have dependencies: inline scripts might depend on external scripts for certain symbols, and vice versa. It’s important that the execution order of the inline scripts and external scripts is preserved. (This is a feature that some of the async wrappers overlook.)  Therefore, inline scripts must also be converted by changing the TYPE attribute from “text/javascript” in the normal syntax:

<script type="text/javascript">
var name = getName();
<script>

to “text/cjs”, like this:

<script type="text/cjs">
var name = getName();
<script>

That’s it! ControlJS takes care of the rest.

ControlJS: how it works

Your existing SCRIPTs no longer block the page because the TYPE attribute has been changed to something the browser doesn’t recognize. This allows ControlJS to take control and load the scripts in a more high performance way. Let’s take a high-level look at how ControlJS works. You can also view the control.js script for more details.

We want to start downloading the scripts as soon as possible. Since we’re downloading them as an IMAGE or OBJECT they won’t block the page during the download phase. And since they’re not being downloaded as a SCRIPT they won’t be executed.  ControlJS starts by finding all the SCRIPT elements that have the “text/cjs” type. If the script has a DATA-CJSSRC attribute then an IMAGE (for IE and Opera) or OBJECT (for all other browsers) is created dynamically with the appropriate URL. (See Stoyan’s post for the full details.)

By default ControlJS waits until the window load event before it begins the execution phase. (It’s also possible to have it start executing scripts immediately or once the DOM is loaded.) ControlJS iterates over its scripts a second time, doing so in the order they appear in the page. If the script is an inline script the code is extracted and evaluated. If the script is an external script and its IMAGE or OBJECT download has completed then it’s inserted in the page as a SCRIPT element so that the code is parsed and executed. If the IMAGE or OBJECT download hasn’t finished then it reenters the iteration after a short timeout and tries again.

There’s more functionality I’ll talk about in later posts regarding document.write and skipping the execution step. For now, let’s look at a simple async loading example.

Async example

To show ControlJS’s async loading capabilities I’ve created an example that contains three scripts in the HEAD:

  • main.js – takes 4 seconds to download
  • an inline script that references symbols from main.js
  • page.js – takes 2 seconds to download and references symbols from the inline script

I made page.js take less time than main.js to make sure that the scripts are executed in the correct order (even though page.js downloads more quickly). I include the inline script because this is a pattern I see frequently (e.g., Google Analytics) but many script loader helpers don’t support inline scripts as part of the execution order.

Async withOUT ControlJS is the baseline example. It loads the scripts in the normal way. The HTTP waterfall chart generated in IE8 (using HttpWatch) is shown in Figure 1. IE8 is better than IE 6&7 – it loads scripts in parallel so main.js and page.js are downloaded together. But all versions of IE block image downloads until scripts are done, so images 1-4 get pushed out. Rendering is blocked by scripts in all browsers. In this example, main.js blocks rendering for four seconds as indicated by the green vertical line.


Figure 1: Async withOUT ControlJS waterfall chart (IE8)

Async WITH ControlJS demonstrates how ControlJS solves the blocking problems caused by scripts. Unlike the baseline example, the scripts and images are all downloaded in parallel shaving 1 second off the overall download time. Also, rendering starts immediately. If you load the two examples in your browser you’ll notice how dramatically faster the ControlJS page feels. There are three more requests in Figure 2′s waterfall chart. One is the request for control.js – this is loaded asynchronously so it doesn’t slow down the page. The other two requests are because main.js and page.js are loaded twice. The first time is when they are downloaded asynchronously as IMAGEs. Later, ControlJS inserts them into the page as SCRIPTs in order to get their JavaScript executed. Because main.js and page.js are already in the cache there’s no download penalty, only the short cache read indicated by the skinny blue line.


Figure 2: Async WITH ControlJS waterfall chart (IE8)

The ControlJS project

ControlJS is open source under the Apache License. The control.js script can be found in the ControlJS Google Code project. Discussion topics can be created in the ControlJS Google Group. The examples and such are on my website at the ControlJS Home Page. I’ve written this code as a proof of concept. I’ve only tested it on the major browsers. It needs more review before being used in a production environment.

Only part 1

This is only the first of three blog posts about ControlJS. There’s more to learn. This technique might seem like overkill. If all we wanted to do was load scripts asynchronously some other techniques might suffice. But my goal is to delay all scripts until after the page has rendered. This means that we have to figure out how to delay scripts that do document.write. Another goal is to support requirements like Gmail mobile had – the desire to download JavaScript but delay the blocking penalties that come during script execution. I’ll be talking about those features in the next two blog posts.

var name = getName();

30 Comments

Evolution of Script Loading

December 6, 2010 8:06 am | 14 Comments

Velocity China starts tomorrow morning. I’m kicking it off with a keynote and am finishing up my slides. I’ll be talking about progressive enhancement and smart script loading. As I reviewed the slides it struck me how the techniques for loading external scripts have changed significantly in the last few years. Let’s take a look at what we did, where we are now, and try to see what the future might be.

Scripts in the HEAD

Just a few years ago most pages, even for the top websites, loaded their external scripts with HTML in the HEAD of the document:

<head>
<script src=”core.js” type=”text/javascript”></script>
<script src=”more.js” type=”text/javascript”></script>
</head>

Developers in tune with web performance best practices cringe when they see scripts loaded this way. We know that older browsers (now that’s primarily Internet Explorer 6 & 7) load scripts sequentially. The browser downloads core.js and then parses and executes it. Then it does the same with more.js – downloads the script and then parses and executes it.

In addition to loading scripts sequentially, older browsers block other downloads until this sequential script loading phase is completed. This means there may be a significant delay before these browsers even start downloading stylesheets, images, and other resources in the page.

These download issues are mitigated in newer browsers (see my script loading roundup). Starting with IE 8, Firefox 3.6, Chrome 2, and Safari 4 scripts generally get downloaded in parallel with each other as well as with other types of resources. I say “generally” because there are still quirks in how browsers perform this parallel script loading. In IE 8 and 9beta, for example, images are blocked from being downloaded during script downloads (see this example). In a more esoteric use case, Firefox 3 loads scripts that are document.written into the page sequentially rather than in parallel.

Then there’s the issue with rendering being blocked by scripts: any DOM elements below a SCRIPT tag won’t be rendered until that script is finished loading. This is painful. The browser has already downloaded the HTML document but none of those Ps, DIVs, and ULs get shown to the user if there’s a SCRIPT above them. If you put the SCRIPT in the document HEAD then the entire page is blocked from rendering.

There are a lot of nuances in how browsers load scripts especially with the older browsers. Once I fully understood the impact scripts had on page loading I came up with my first recommendation to improve script loading:

Move Scripts to the Bottom

Back in 2006 and 2007 when I started researching faster script loading, all browsers had the same problems when it came to loading scripts:

  • Scripts were loaded sequentially.
  • Loading a script blocked all other downloads in the page.
  • Nothing below the SCRIPT was rendered until the script was done loading.

The solution I came up with was to move the SCRIPT tags to the bottom of the page.

...
<script src=”core.js” type=”text/javascript”></script>
<script src=”more.js” type=”text/javascript”></script>
</body>

This isn’t always possible, for example, scripts for ads that do document.write can’t be moved – they have to do their document.write in the exact spot where the ad is supposed to appear. But many scripts can be moved to the bottom of the page with little or no work. The benefits are immediately obvious – images download sooner and the page renders more quickly. This was one of my top recommendations in High Performance Web Sites. Many websites adopted this change and saw the benefits.

Load Scripts Asynchronously

Moving scripts to the bottom of the page avoided some problems, but other script loading issues still existed. During 2008 and 2009 browsers still downloaded scripts sequentially. There was an obvious opportunity here to improve performance. Although it’s true that scripts (often) need to be executed in order, they don’t need to be downloaded in order. They can be downloaded in any order – as long as the browser preserves the original order of execution.

Browser vendors realized this. (I like to think that I had something to do with that.) And newer browsers (starting with IE8, Firefox 3.6, Chrome 2, and Safari 4 as mentioned before) started loading scripts in parallel. But back in 2008 & 2009 sequential script loading was still an issue. I was analyzing MSN.com one day and noticed that their scripts loaded in parallel – even though this was back in the Firefox 2.0 days. They were using the Script DOM Element approach:

var se = document.createElement("script");
se.src = "core.js";
document.getElementsByTagName("head")[0].appendChild(se);

I’ve spent a good part of the last few years researching asynchronous script loading techniques like this. These async techniques (summarized in this blog post with full details in chapter 4 of Even Faster Web Sites) achieve parallel script loading in older browsers and avoid some of the quirks in newer browsers. They also mitigate the issues with blocked rendering: when a script is loaded using an async technique the browser charges ahead and renders the page while the script is being downloaded. This example has a script in the HEAD that’s loaded using the Script DOM Element technique. This script is configured to take 4 seconds to download. When you load the URL you’ll see the page render immediately, proving that rendering proceeds when scripts are loaded asynchronously.

Increased download parallelism and faster rendering – what more could you want? Well…

Async + On-demand Execution

Loading scripts asynchronously speeds up downloads (more parallelism) and rendering. But – when the scripts arrive at the browser rendering stops and the browser UI is locked while the script is parsed and executed. There wouldn’t be any room for improvement here if all that JavaScript was needed immediately, but websites don’t use all the code that’s downloaded – at least not right away. The Alexa US Top 10 websites download an average of 229 kB of JavaScript (compressed) but only execute 29% of those functions by the time the load event fires. The other 71% of code is cruft, conditional blocks, or most likely DHTML and Ajax functionality that aren’t used to render the initial page.

This discovery led to my recommendation to split the initial JavaScript download into the code needed to render the page and the code that can be loaded later for DHTML and Ajax features. (See this blog post or chapter 3 of EFWS.) Websites often load the code that’s needed later in the window’s onload handler. The Gmail Mobile team found wasn’t happy with the UI locking up when that later code arrived at the browser. After all, this DHTML/Ajaxy code might not even be used. They’re the first folks I saw who figured out a way to separate the download phase from the parse-execute phase of script loading. They did this by wrapping all the code in a comment, and then when the code is needed removing the comment delimiters and eval’ing. Gmail’s technique uses iframes so requires changing your existing scripts. Stoyan has gone on to explore using image and object tags to download scripts without the browser executing them, and then doing the parse-execute when the code is needed.

What’s Next?

Web pages are getting more complex. Advanced developers who care about performance need more control over how the page loads. Giving developers the ability to control when an external script is parsed and executed makes sense. Right now it’s way too hard. Luckily, help is on the horizon. Support for LINK REL=”PREFETCH” is growing. This provides a way to download scripts while avoiding parsing and execution. Browser vendors need to make sure the LINK tag has a load event so developers can know whether or not the file has finished downloading. Then the file that’s already in the browser’s cache can be added asynchronously using the Script DOM Element approach or some other technique only when the code is actually needed.

We’re close to having the pieces to easily go to the next level of script loading optimization. Until all the pieces are there, developers on the cutting edge will have to do what they always do – work a little harder and stay on top of the latest best practices. For full control over when scripts are downloaded and when they’re parsed and executed I recommend you take a look at the posts from Gmail Mobile and Stoyan.

Based on the past few years I’m betting there are  more improvements to script loading still ahead.

14 Comments

Render first. JS second.

September 30, 2010 11:10 am | 28 Comments

Let me start with the takeaway point:

The key to creating a fast user experience in today’s web sites is to render the page as quickly as possible. To achieve this JavaScript loading and execution has to be deferred.

I’m in the middle of several big projects so my blogging rate is down. But I got an email today about asynchronous JavaScript loading and execution. I started to type up my lengthy response and remembered one of those tips for being more productive: “type shorter emails – no one reads long emails anyway”. That just doesn’t resonate with me. I like typing long emails. I love going into the details. But, I agree that an email response that only a few people might read is not the best investment of time. So I’m writing up my response here.

It took me months to research and write the “Loading Scripts Without Blocking” chapter from Even Faster Web Sites. Months for a single chapter! I wasn’t the first person to do async script loading – I noticed it on MSN way before I started that chapter – but that work paid off. There has been more research on async script loading from folks like Google, Facebook and Meebo. Most JavaScript frameworks have async script loading features – two examples are YUI and LABjs. And 8 of today’s Alexa Top 10 US sites use advanced techniques to load scripts without blocking: Google, Facebook, Yahoo!, YouTube, Amazon, Twitter, Craigslist(!), and Bing. Yay!

The downside is – although web sites are doing a better job of downloading scripts without blocking, once those scripts arrive their execution still blocks the page from rendering. Getting the content in front of the user as quickly as possible is the goal. If asynchronous scripts arrive while the page is loading, the browser has to stop rendering in order to parse and execute those scripts. This is the biggest obstacle to creating a fast user experience. I don’t have scientific results that I can cite to substantiate this claim (that’s part of the big projects I’m working on). But anyone who disables JavaScript in their browser can attest that sites feel twice as fast.

My #1 goal right now is to figure out ways that web sites can defer all JavaScript execution until after the page has rendered. Achieving this goal is going to involve advances from multiple camps – changes to browsers, new web development techniques, and new pieces of infrastructure. I’ve been talking this up for a year or so. When I mention this idea these are the typical arguments I hear for why this won’t work:

JavaScript execution is too complex
People point out that: “JavaScript is a powerful language and developers use it in weird, unexpected ways. Eval and document.write create unexpected dependencies. Blanketedly delaying all JavaScript execution is going to break too many web sites.”

In response to this argument I point to Opera’s Delayed Script Execution feature. I encourage you to turn it on, surf around, and  try to find a site that breaks. Even sites like Gmail and Facebook work! I’m sure there are some sites that have problems (perhaps that’s why this feature is off by default). But if some sites do have problems, how many sites are we talking about? And what’s the severity of the problems? We definitely don’t want errors, rendering problems, or loss of ad revenue. Even though Opera has had this feature for over two years (!), I haven’t heard much discussion about it. Imagine what could happen if significant resources focused on this problem.

the page content is actually generated by JS execution
Yes, this happens too much IMO. The typical logic is: “We’re building an Ajax app so the code to create and manipulate DOM elements has to be written in JavaScript. We could also write that logic on the serverside (in C++, Java, PHP, Python, etc.), but then we’re maintaining two code bases. Instead, we download everything as JSON and create the DOM in JavaScript on the client – even for the initial page view.”
If this is the architecture for your web app, then it’s true – you’re going to have to download and execute (a boatload) of JavaScript before the user can see the page. “But it’s only for the first page view!” The first page view is the most important one. “People start our web app and then leave it running all day, so they only incur the initial page load once.” Really? Have you verified this? In my experience, users frequently close their tab or browser, or even reboot. “But then at least they have the scripts in their cache.” Even so, the scripts still have to be executed and they’re probably going to have to refetch the JSON responses that include data that could have changed.
Luckily, there’s a strong movement toward server-side JavaScript – see Doug Crockford’s Loopage talk and node.js. This would allow that JavaScript code to run on the server, render the DOM, and serve it as HTML to the browser so that it renders quickly without needing JavaScript. The scripts needed for subsequent dynamic Ajaxy behavior can be downloaded lazily after the page has rendered. I expect we’ll see more solutions to address this particular problem of serving the entire page as HTML, even parts that are rendered using JavaScript
doing this actually makes the page slower
The crux of this argument centers around how you define “slower”. The old school way of measuring performance is to look at how long it takes for the window’s load event to fire. (It’s so cool to call onload “old school”, but a whole series of blog posts on better ways to measure performance is called for.)
It is true that delaying script execution could result in larger onload times. Although delaying scripts until after onload is one solution to this problem, it’s more important to talk about performance yardsticks. These days I focus on how quickly the critical page content renders. Many of Amazon’s pages, for example, have a lot of content and consequently can have a large onload time, but they do an amazing job of getting the content above-the-fold to render quickly.
WebPagetest.org‘s video comparison capabilities help hammer this home. Take a look at this video of Techcrunch, Mashable, and a few other tech news sites loading. This forces you to look at it from the user’s perspective. It doesn’t really matter when the load event fired, what matters is when you can see the page. Reducing render time is more in tune with creating a faster user experience, and delaying JavaScript is key to reducing render time.

What are the next steps?

  • Browsers should look at Opera’s behavior and implement the SCRIPT ASYNC and DEFER attributes.
  • Developers should adopt asynchronous script loading techniques and avoid rendering the initial page view with JavaScript on the client.
  • Third party snippet providers, most notably ads, need to move away from document.write.
Rendering first and executing JavaScript second is the key to a faster Web.

28 Comments

newtwitter performance analysis

September 22, 2010 10:51 pm | 15 Comments

Among the exciting launches last week was newtwitter – the amazing revamp of the Twitter UI. I use Twitter a lot and was all over the new release, and so was stoked to see this tweet from Twitter developer Ben Cherry:

The new Twitter UI looks good, but how does it score when it comes to performance? I spent a few hours investigating. I always start with HTTP waterfall charts, typically generated by HttpWatch. I look at Firefox and IE because they’re the most popular browsers (and I use a lot of Firefox tools). Here’s the waterfall chart for Firefox 3.6.10:

I used to look at IE7 but its market share is dropping, so now I start with IE8. Here’s the waterfall chart for IE8:

From the waterfall charts I generate a summary to get an idea of the overall page size and potential for problems.

Firefox 3.6.10 IE8
main content rendered ~4 secs ~5 secs
window onload ~2 secs ~7 secs
network activity done ~5 secs ~7 secs
# requests 53 52
total bytes downloaded 428 kB 442 kB
JS bytes downloaded 181 kB 181 kB
CSS bytes downloaded 21 kB 21 kB

I study the waterfall charts looking for problems, primarily focused on places where parallel downloading stops and where there are white gaps. Then I run Page Speed and YSlow for more recommendations. Overall newtwitter does well, scoring 90 on Page Speed and 86 on YSlow. Combining all of this investigation results in this list of performance suggestions:

  1. Script Loading – The most important thing to do for performance is get JavaScript out of the way. “Out of the way” has two parts:
    • blocking downloads – Twitter is now using the new Google Analytics async snippet (yay!) so ga.js isn’t blocking. However, base.bundle.js is loaded using normal SCRIPT SRC, so it blocks. In newer browsers there will be some parallel downloads, but even IE9 will block images until the script is done downloading. In both Firefox and IE it’s clear this is causing a break in parallel downloads. phoenix.bundle.js and api.bundle.js are loaded using LABjs in conservative mode. In both Firefox and IE there’s a big block after these two scripts. It could be LABjs missing an opportunity, or it could be that the JavaScript is executing. But it’s worth investigating why all the resources lower in the page (~40 of them) don’t start downloading until after these scripts are fetched. It’d be better to get more parallelized downloads.
    • blocked rendering – The main part of the page takes 4-5 seconds to render. In some cases this is because the browser won’t render anything until all the JavaScript is downloaded. However, in this page the bulk of the content is generated by JS. (I concluded this after seeing that many of the image resources are specified in JS.) Rather than download a bunch of scripts to dynamically create the DOM, it’d be better to do this on the server side as HTML as part of the main HTML document. This can be a lot of work, but the page will never be fast for users if they have to wait for JavaScript to draw the main content in the page. After the HTML is rendered, the scripts can be downloaded in the background to attach dynamic behavior to the page.
  2. scattered inline scripts – Fewer inline scripts are better. The main reason is that a stylesheet followed by an inline script blocks subsequent downloads. (See Positioning Inline Scripts.) In this page, phoenix.bundle.css is followed by the Google Analytics inline script. This will cause the resources below that point to be blocked – in this case the images. It’d be better to move the GA snippet to the SCRIPT tag right above the stylesheet.
  3. Optimize Images - This one image alone could be optimized to save over 40K (out of 52K): http://s.twimg.com/a/1285108869/phoenix/img/tweet-dogear.png.
  4. Expires header is missing - Strangely, some images are missing both the Expires and Cache-Control headers, for example, http://a2.twimg.com/profile_images/30575362/48_normal.png. My guess is this is an origin server push problem.
  5. Cache-Control header is missing – Scripts from Amazon S3 have an Expires header, but no Cache-Control header (eg http://a2.twimg.com/a/1285097693/javascripts/base.bundle.js). This isn’t terrible, but it’d be good to include Cache-Control: max-age. The reason is that Cache-Control: max-age is relative (“# of seconds from right now”) whereas Expires is absolute (“Wed, 21 Sep 2011 20:44:22 GMT”). If the client has a skewed clock the actual cache time could be different than expected. In reality this happens infrequently.
  6. redirect - http://www.twitter.com/ redirects to http://twitter.com/. I love using twitter.com instead of www.twitter.com, but for people who use “www.” it would be better not to redirect them. You can check your logs and see how often this happens. If it’s small (< 1% of unique users per day) then it’s not a big problem.
  7. two spinners – Could one of these spinners be eliminated: http://twitter.com/images/spinner.gif and http://twitter.com/phoenix/img/loader.gif ?
  8. mini before profile – In IE the “mini” images are downloaded before the normal “profile” images. I think the profile images are more important, and I wonder why the order in IE is different than Firefox.
  9. CSS cleanup – Page Speed reports that there are > 70 very inefficient CSS selectors.
  10. Minify - Minifying the HTML document would save ~5K. Not big, but it’s three roundtrips for people on slow connections.

Overall newtwitter is beautiful and fast. Most of the Top 10 web sites are using advanced techniques for loading JavaScript. That’s the key to making today’s web apps fast.

15 Comments

Diffable: only download the deltas

July 9, 2010 9:31 am | 15 Comments

There were many new products and projects announced at Velocity, but one that I just found out about is Diffable. It’s ironic that I missed this one given that it happened at Velocity and is from Google. The announcement was made during a whiteboard talk, so it didn’t get much attention. If your web site has large JavaScript downloads you’ll want to learn more about this performance optimization technique.

The Diffable open source project has plenty of information, including the Diffable slides used by Josh Harrison and James deBoer at Velocity. As explained in the slides, Diffable uses differential compression to reduce the size of JavaScript downloads. It makes a lot of sense. Suppose your web site has a large external script. When a new release comes out, it’s often the case that a bulk of that large script is unchanged. And yet, users have to download the entire new script even if the old script is still cached.

Josh and James work on Google Maps which has a main script that is ~300K. A typical revision for this 300K script produces patches that are less than 20K. It’s wasteful to download that other 280K if the user has the old revision in their cache. That’s the inspiration for Diffable.

Diffable is implemented on the server and the client. The server component records revision deltas so it can return a patch to bring older versions up to date. The client component (written in JavaScript) detects if an older version is cached and if necessary requests the patch to the current version. The client component knows how to merge the patch with the cached version and evals the result.

The savings are significant. Using Diffable has reduced page load times in Google Maps by more than 1200 milliseconds (~25%). Note that this benefit only affects users that have an older version of the script in cache. For Google Maps that’s 20-25% of users.

In this post I’ve used scripts as the example, but Diffable works with other resources including stylesheets and HTML. The biggest benefit is with scripts because of their notorious blocking behavior. The Diffable slides contain more information including how JSON is used as the delta format, stats that show there’s no performance hit for using eval, and how Diffable also causes the page to be enabled sooner due to faster JavaScript execution. Give it a look.

15 Comments

Velocity: Google Maps API performance

July 7, 2010 1:09 pm | 1 Comment

Several months ago I saw Susannah Raub do an internal tech talk on the performance improvements behind Google Maps API v3. She kindly agreed to reprise the talk at Velocity. Luckily it was videotaped, and the slides (ODP) are available, too. It’s a strong case study on improving performance, is valuable for developers working with the Google Maps API, and has a few takeaways that I’ll blog about more soon.

Susannah starts off bravely by showing how Google Maps API v2 takes 17 seconds to load on an iPhone. This was the motivation for the work on v3 – to improve performance. In order to improve performance you have to start by measuring it. The Google Maps team broke down “performance” into three categories:

  • user perceived latency – how long it takes for the page to appear usable, in this case for the map to be rendered
  • page ready time - how long it takes for the page to become usable, e.g. for the map to be draggable
  • page load time - how long it takes for all the elements to be present, in the case of maps this includes all of the map controls to be loaded and working

The team wanted to measure all of these areas. It’s fairly easy to find tools to measure performance on the desktop – the Google Maps teamed used HttpWatch. Performance tools, or any development tools for that matter, are harder to come by in the mobile space. But the team especially wanted to focus on creating a fast experience on mobile devices. They ended up using Fiddler as a proxy to gain visibility into the page’s performance profile.

future blog post #1: Coincidentally, today I saw a tweet about Craig Dunn’s instructions for Monitoring iPhone web traffic (with Fiddler). This is a huge takeaway for anyone doing web development for mobile. At Velocity, Eric Lawrence (creator of Fiddler) announced Fiddler support for the HTTP Archive Specification. The HTTP Archive (HAR) format is a specification I initiated over a year ago with folks from HttpWatch and Firebug. HAR is becoming the industry standard just as I had hoped and is now supported in numerous developer tools. I wrote one such tool, called HAR to Page Speed, that takes a HAR file and displays a Page Speed performance analysis as well as an HTTP waterfall chart. Putting all these pieces together, you can now load a web site on your iPhone, monitor it with Fiddler, export it to a HAR file, and upload it to HAR to Page Speed to find out how it performs. Given Fiddler’s extensive capabilities for creating addons, I expect it won’t be long before all of this is built into Fiddler itself.

In the case of Google Maps API, the long pole in the tent was main.js. They have a small (15K) bootstrap script that loads main.js (180K). (All of the script sizes in this blog post are UNcompressed sizes.) The performance impact of main.js was especially bad on mobile devices because of less caching. They compiled their JavaScript (using Closure) and combined three HTTP requests into one.

future blog post #2: The team also realized that although their JavaScript download was large, the revisions between releases was small. They created a framework for only downloading deltas when possible that cut seconds off their download times. More on this tomorrow.

These performance improvements helped, but they wanted to go further. They redesigned their code using an MVC architecture. As a result, the initial download only needs to include the models, which are small. The larger views and controllers that do all the heavy lifting are loaded asynchronously. This reduced the initial bootstrap script from 15K to 4K, and the main.js from 180K to 33K.

The results speak for themselves. Susannah concludes by showing how v3 of Google Maps API takes only 5 seconds to load on the iPhone, compared to v2′s 17 seconds. The best practices the team employed for making Google Maps faster are valuable for anyone working on JavaScript-heavy web sites. Take a look at the video and slides, and watch here for a follow-up on Fiddler for iPhone and loading JavaScript deltas.

1 Comment