2010 State of Performance

December 24, 2010 3:24 pm | Comments Off on 2010 State of Performance

I wrote today’s post on the Performance Calendar titled “2010 State of Performance”. Here’s the concluding paragraph:

The highlights of 2010 for me were the emergence of WPO as an industry, establishment of the W3C Web Performance Working Group, strength of open source tools, adoption of the HAR format, and increased awareness of the impact of third party content. In 2011 Iâ€™m looking forward to better browser benchmarks and instrumentation, mobile tools and best practices, and faster ads. But the list is much longer than this blog post â€“ I didnâ€™t even mention separation of script downloading and execution, HTML5 pros and cons, improvements to browser caching, and TCP and SSL optimizations. What did you think was important in 2010 and where will the big gains come from in 2011? I think weâ€™ll agree on one thing â€“ the only direction to go in is faster.

Go read the full post and leave your thoughts about web performance in 2010 and 2011. We’ve got another exciting year ahead of us.

Comments Off on 2010 State of Performance

ControlJS part 3: overriding document.write

December 15, 2010 11:18 pm | 22 Comments

This is the third of three blog posts about ControlJS – a JavaScript module for making scripts load faster. The three blog posts describe how ControlJS is used for async loading, delayed execution, and overriding document.write.

The goal of ControlJS is to give developers more control over how JavaScript is loaded. The key insight is to recognize that â€œloadingâ€ has two phases: download (fetching the bytes) and execution (including parsing). In ControlJS part 1 I focused on the download phase using ControlJS to download scripts asynchronously. ControlJS part 2 showed how delaying script execution makes pages load faster, especially for Ajax web apps that download a lot of JavaScript that’s not used immediately. In this post I look at how document.write causes issues when loading scripts asynchronously.

I like ControlJS, but I’ll admit now that I was unable to solve the document.write problem to my satisfaction. Nevertheless, I describe the dent I put in the problem of document.write and point out two alternatives that do a wonderful job of making async work in spite of document.write.

Async and document.write

ControlJS loads scripts asynchronously. By default these async scripts are executed after window onload. If one of these async scripts calls document.write the page is erased. That’s because calling document.write after the document is loaded automatically calls document.open. Anything written to the open document replaces the existing document content. Obviously, this is not the desired behavior.

I wanted ControlJS to work on all scripts in the page. Scripts for ads are notorious for using document.write, but I also found scripts created by website owners that use document.write. I didn’t want those scripts to be inadvertently loaded with ControlJS and have the page get wiped out.

The problem was made slightly easier for me because ControlJS is in control when each inline and external script is being executed. My solution is to override document.write and capture the output for each script one at a time. If there’s no document.write output then proceed to the next script. If there is output then I create an element (for now a SPAN), insert it into the DOM immediately before the SCRIPT tag that’s being executed, and set the SPAN’s innerHTML to the output from document.write.

This works pretty well. I downloaded a local copy of CNN.com and converted it to ControlJS. There were two places where document.write was used (including an ad) and both worked fine. But there are issues. One issue I encountered is that wrapping the output in a SPAN can break certain behavior like hover handlers. But a bigger issue is that browsers ignore “<script…” added via innerHTML, and one of the main ways that ads use document.write is to insert scripts. I have a weak solution to this that parses out the “<script…” HTML and creates a SCRIPT DOM element with the appropriate SRC. It works, but is not very robust.

ControlJS is still valuable even without a robust answer for document.write. You don’t have to convert every script in the page to use ControlJS. If you know an ad or widget uses document.write you can load those scripts the normal way. Or you can test and see how complex the document.write is – perhaps it’s a use cases that ControlJS handles.

Real solutions for async document.write

I can’t end this series of posts without highlighting two alternatives that have solved the asynchronous document.write issue: Aptimize and GhostWriter.

Aptimize sells a web accelerator that changes HTML markup on the fly to have better performance. In September they announced the capability to load scripts asynchronously – including ads. As far as I know this is the first product that offered this feature. I had lunch with Ed and Derek from Aptimize this week and found out more about their implementation. It sounds solid.

Another solution is GhostWriter from Digital Fulcrum. Will contacted me and we had a concall and a few email exchanges about his implementation that uses an HTML parser on the document.write output. You can try the code for free.

Wrapping it up

ControlJS has many features including some that aren’t found in other script loading modules:

downloads scripts asynchronously
handles both inline scripts and external scripts
delays script execution until after the page has rendered
allows for scripts to be downloaded and not executed
integrates with simple changes to HTML – no code changes
solves some document.write async use cases
control.js itself is loaded asynchronously

ControlJS is open source. I hope some of these features are borrowed and added to other script loaders. There’s still more features to implement (async stylesheets, better document.write handling) and ControlJS has had minimal testing. If you’d like to add features or fix bugs please take a look at the ControlJS project on Google Code and contact the group via the ControlJS discussion list.

22 Comments

ControlJS part 2: delayed execution

December 15, 2010 11:11 pm | 15 Comments

This is the second of three blog posts about ControlJS – a JavaScript module for making scripts load faster. The three blog posts describe how ControlJS is used for async loading, delayed execution, and overriding document.write.

The goal of ControlJS is to give developers more control over how JavaScript is loaded. The key insight is to recognize that â€œloadingâ€ has two phases: download (fetching the bytes) and execution (including parsing). In ControlJS part 1 I focused on the download phase using ControlJS to download scripts asynchronously. In this post I focus on the benefits ControlJS brings to the execution phase of script loading.

The issue with execution

The issue with the execution phase is that while the browser is parsing and executing JavaScript it can’t do much else. From a performance perspective this means the browser UI is unresponsive, page rendering stops, and the browser won’t start any new downloads.

The execution phase can take more time than you might think. I don’t have hard numbers on how much time is spent in this phase (although it wouldn’t be that hard to collect this data with dynaTrace Ajax Edition or Speed Tracer). If you do anything computationally intensive with a lot of DOM interaction the blocking effects can be noticeable. If you have a large amount of JavaScript just parsing the code can take a significant amount of time.

If all the JavaScript was used immediately to render the page we might have to throw up our hands and endure these delays, but it turns out that a lot of the JavaScript downloaded isn’t used right away. Across the Alexa US Top 10 only 29% of the downloaded JavaScript is called before the window load event. (I surveyed this using Page Speed‘s “defer JavaScript” feature.) The other 71% of the code is parsed and executed even though it’s not used for rendering the initial page. For these pages an average of 229 kB of JavaScript is downloaded. That 229 kB is compressed – the uncompressed size is upwards of 500 kB. That’s a lot of JavaScript. Rather than parse and execute 71% of the JavaScript in the middle of page loading, it would be better to avoid that pain until after the page is done rendering. But how can a developer do that?

Loading feature code

To ease our discussion let’s call that 29% of code used to render the page immediate-code and we’ll call the other 71% feature-code. The feature-code is typically for DHTML and Ajax features such as drop down menus, popup dialogs, and friend requests where the code is only executed if the user exercises the feature. (That’s why it’s not showing up in the list of functions called before window onload.)

Let’s assume you’ve successfully split your JavaScript into immediate-code and feature-code. The immediate-code scripts are loaded as part of the initial page loading process using ControlJS’s async loading capabilities. The additional scripts that contain the feature-code could also be loaded this way during the initial page load process, but then the browser is going to lock up parsing and executing code that’s not immediately needed. So we don’t want to load the feature-code during the initial page loading process.

Another approach would be to load the feature-code after the initial page is fully rendered. The scripts could be loaded in the background as part of an onload handler. But even though the feature-code scripts are downloaded in the background, the browser still locks up when it parses and executes them. If the user tries to interact with the page the UI is unresponsive. This unnecessary delay is painful enough that the Gmail mobile team went out of their way to avoid it. To make matters worse, the user has no idea why this is happening since they didn’t do anything to cause the lock up. (We kicked it off “in the background”.)

The solution is to parse and execute this feature-code when the user needs it. For example, when the user first clicks on the drop down menu is when we parse and execute menu.js. If the user never uses the drop down menu, then we avoid the cost of parsing and executing that code altogether. But when the user clicks on that menu we don’t want her to wait for the bytes to be downloaded – that would take too long especially on mobile. The best solution is to download the script in the background but delay execution until later when the code is actually needed.

Download now, Execute later

After that lengthy setup we’re ready to look at the delayed execution capabilities of ControlJS. I created an example that has a drop down menu containing the links to the examples.

In order to illustrate the pain from lengthy script execution I added a 2 second delay to the menu code (a while loop that doesn’t relinquish until 2 seconds have passed). Menu withOUT ControlJS is the baseline case. The page takes a long time to render because it’s blocked while the scripts are downloaded and also during the 2 seconds of script execution.

On the other hand, Menu WITH ControlJS renders much more quickly. The scripts are downloaded asynchronously. Since these scripts are for an optional feature we want to avoid executing them until later. This is achieved by using the “data-cjsexec=false” attribute, in addition to the other ControlJS modifications to the SCRIPT’s TYPE and SRC attributes:

<script data-cjsexec=false type="text/cjs" data-cjssrc="jquery.min.js"></script>
<script data-cjsexec=false type="text/cjs" data-cjssrc="fg.menu.js"></script>

The “data-cjsexec=false” setting means that the scripts are downloaded and stored in the cache, but they’re not executed. The execution is triggered later if and when the user exercises the feature. In this case, clicking on the Examples button is the trigger:

examplesbtn.onclick = function() {
   CJS.execScript("jquery.min.js");
   CJS.execScript("fg.menu.js", createExamplesMenu);
};

CJS.execScript() creates a SCRIPT element with the specified SRC and inserts it into the DOM. The menu creation function, createExamplesMenu, is passed in as the onload callback function for the last script. The 2 second delay is therefore incurred the first time the user clicks on the menu, but it’s tied to a the user’s action, there’s no delay to download the script, and the execution delay only occurs once.

The ability to separate and control the download phase and the execution phase during script loading is a key differentiator of ControlJS. Many websites won’t need this option. But websites that have a lot of code that’s not used to render the initial page, such as Ajax web apps, will benefit from avoiding the pain of script parsing and execution until it’s absolutely necessary.

15 Comments

ControlJS part 1: async loading

December 15, 2010 10:45 pm | 30 Comments

This is the first of three blog posts about ControlJS – a JavaScript module for making scripts load faster. The three blog posts describe how ControlJS is used for async loading, delayed execution, and overriding document.write.

The #1 performance best practice I’ve been evangelizing over the past year is progressive enhancement: deliver the page as HTML so it renders quickly, then enhance it with JavaScript. There are too many pages that are blank while several hundred kB of JavaScript is downloaded, parsed, and executed so that the page can be created using the DOM.

It’s easy to evangelize progressive enhancement – it’s much harder to actually implement it. That several hundred kB of JavaScript has to be unraveled and reorganized. All the logic that created DOM elements in the browser via JavaScript has to be migrated to run on the server to generate HTML. Even with new server-side JavaScript capabilities this is a major redesign effort.

I keep going back to Opera’s Delayed Script Execution option. Just by enabling this configuration option JavaScript is moved aside so that page rendering can come first. And the feature works – I can’t find a single website the suffers any errors with this turned on. While I continue to encourage other browser vendors to implement this feature, I want a way for developers to get this behavior sooner rather than later.

A number of web accelerators (like Aptimize, Strangeloop, FastSoft, CloudFlare, Torbit, and more recently mod_pagespeed) have emerged over the last year or so. They modify the HTML markup to inject performance best practices into the page. Thinking about this model I considered ways that markup could be changed to more easily delay the impact of JavaScript on progressive rendering.

The result is a JavaScript module I call ControlJS.

Controlling download and execution

The goal of ControlJS is to give developers more control over how JavaScript is loaded. The key insight is to recognize that “loading” has two phases: download (fetching the bytes) and execution (including parsing). These two phases need to be separated and controlled for best results.

Controlling how scripts are downloaded is a popular practice among performance-minded developers. The issue is that when scripts are loaded the normal way (<script src=""...) they block other resource downloads (lesser so in newer browsers) and also block page rendering (in all browsers). Using asynchronous script loading techniques mitigates these issues to a large degree. I wrote about several asynchronous loading techniques in Even Faster Web Sites. LABjs and HeadJS are JavaScript modules that provide wrappers for async loading. While these async techniques address the blocking issues that occur during the script download phase, they don’t address what happens during the script execution phase.

Page rendering and new resource downloads are blocked during script execution. With the amount of JavaScript on today’s web pages ever increasing, the blocking that occurs during script execution can be significant, especially for mobile devices. In fact, the Gmail mobile team thought script execution was such a problem they implemented a new async technique that downloads JavaScript wrapped in comments. This allows them to separate the download phase from the execution phase. The JavaScript is downloaded so that it resides on the client (in the browser cache) but since it’s a comment there’s no blocking from execution. Later, when the user invokes the associated features, the JavaScript is executed by removing the comment and evaluating the code.

Stoyan Stefanov, a former teammate and awesome performance wrangler, recently blogged about preloading JavaScript without execution. His approach is to download the script as either an IMAGE or an OBJECT element (depending on the browser). Assuming the proper caching headers exist the response is cached on the client and can later be inserted as a SCRIPT element. This is the technique used in ControlJS.

ControlJS: how to use it

To use ControlJS you need to make three modifications to your page.

Modification #1: add control.js

I think it’s ironic that JavaScript modules for loading scripts asynchronously have to be loaded in a blocking manner. From the beginning I wanted to make sure that ControlJS itself could be loaded asynchronously. Here’s the snippet for doing that:

var cjsscript = document.createElement('script');
cjsscript.src = "control.js";
var cjssib = document.getElementsByTagName('script')[0];
cjssib.parentNode.insertBefore(cjsscript, cjssib);

Modification #2: change external scripts

The next step is to transform all of the old style external scripts to load the ControlJS way. The normal style looks like this:

<script type="text/javascript" src="main.js"><script>

The SCRIPT element’s TYPE attribute needs to be changed to “text/cjs” and the SRC attribute needs to be changed to DATA-CJSSRC, like this:

<script type="text/cjs" data-cjssrc="main.js"><script>

Modification #3: change inline scripts

Most pages have inline scripts in addition to external scripts. These scripts have dependencies: inline scripts might depend on external scripts for certain symbols, and vice versa. It’s important that the execution order of the inline scripts and external scripts is preserved. (This is a feature that some of the async wrappers overlook.)Â Therefore, inline scripts must also be converted by changing the TYPE attribute from “text/javascript” in the normal syntax:

<script type="text/javascript">
var name = getName();
<script>

to “text/cjs”, like this:

<script type="text/cjs">
var name = getName();
<script>

That’s it! ControlJS takes care of the rest.

ControlJS: how it works

Your existing SCRIPTs no longer block the page because the TYPE attribute has been changed to something the browser doesn’t recognize. This allows ControlJS to take control and load the scripts in a more high performance way. Let’s take a high-level look at how ControlJS works. You can also view the control.js script for more details.

We want to start downloading the scripts as soon as possible. Since we’re downloading them as an IMAGE or OBJECT they won’t block the page during the download phase. And since they’re not being downloaded as a SCRIPT they won’t be executed.Â ControlJS starts by finding all the SCRIPT elements that have the “text/cjs” type. If the script has a DATA-CJSSRC attribute then an IMAGE (for IE and Opera) or OBJECT (for all other browsers) is created dynamically with the appropriate URL. (See Stoyan’s post for the full details.)

By default ControlJS waits until the window load event before it begins the execution phase. (It’s also possible to have it start executing scripts immediately or once the DOM is loaded.) ControlJS iterates over its scripts a second time, doing so in the order they appear in the page. If the script is an inline script the code is extracted and evaluated. If the script is an external script and its IMAGE or OBJECT download has completed then it’s inserted in the page as a SCRIPT element so that the code is parsed and executed. If the IMAGE or OBJECT download hasn’t finished then it reenters the iteration after a short timeout and tries again.

There’s more functionality I’ll talk about in later posts regarding document.write and skipping the execution step. For now, let’s look at a simple async loading example.

Async example

To show ControlJS’s async loading capabilities I’ve created an example that contains three scripts in the HEAD:

main.js – takes 4 seconds to download
an inline script that references symbols from main.js
page.js – takes 2 seconds to download and references symbols from the inline script

I made page.js take less time than main.js to make sure that the scripts are executed in the correct order (even though page.js downloads more quickly). I include the inline script because this is a pattern I see frequently (e.g., Google Analytics) but many script loader helpers don’t support inline scripts as part of the execution order.

Async withOUT ControlJS is the baseline example. It loads the scripts in the normal way. The HTTP waterfall chart generated in IE8 (using HttpWatch) is shown in Figure 1. IE8 is better than IE 6&7 – it loads scripts in parallel so main.js and page.js are downloaded together. But all versions of IE block image downloads until scripts are done, so images 1-4 get pushed out. Rendering is blocked by scripts in all browsers. In this example, main.js blocks rendering for four seconds as indicated by the green vertical line.

Figure 1: Async withOUT ControlJS waterfall chart (IE8)

Async WITH ControlJS demonstrates how ControlJS solves the blocking problems caused by scripts. Unlike the baseline example, the scripts and images are all downloaded in parallel shaving 1 second off the overall download time. Also, rendering starts immediately. If you load the two examples in your browser you’ll notice how dramatically faster the ControlJS page feels. There are three more requests in Figure 2’s waterfall chart. One is the request for control.js – this is loaded asynchronously so it doesn’t slow down the page. The other two requests are because main.js and page.js are loaded twice. The first time is when they are downloaded asynchronously as IMAGEs. Later, ControlJS inserts them into the page as SCRIPTs in order to get their JavaScript executed. Because main.js and page.js are already in the cache there’s no download penalty, only the short cache read indicated by the skinny blue line.

Figure 2: Async WITH ControlJS waterfall chart (IE8)

The ControlJS project

ControlJS is open source under the Apache License. The control.js script can be found in the ControlJS Google Code project. Discussion topics can be created in the ControlJS Google Group. The examples and such are on my website at the ControlJS Home Page. I’ve written this code as a proof of concept. I’ve only tested it on the major browsers. It needs more review before being used in a production environment.

Only part 1

This is only the first of three blog posts about ControlJS. There’s more to learn. This technique might seem like overkill. If all we wanted to do was load scripts asynchronously some other techniques might suffice. But my goal is to delay all scripts until after the page has rendered. This means that we have to figure out how to delay scripts that do document.write. Another goal is to support requirements like Gmail mobile had – the desire to download JavaScript but delay the blocking penalties that come during script execution. I’ll be talking about those features in the next two blog posts.

var name = getName();

30 Comments

Velocity China

December 10, 2010 10:54 am | 10 Comments

I returned from Velocity China yesterday. It was a great, great conference. It sold out with ~600 people – the main ballroom was standing room only for the morning keynoters. One thing I noticed that’s different in China – people fill in the seats from the front back, whereas in the US people have to be encouraged to move up to the front.

There were 35 sessions. I invited several speakers from the U.S.:

Changhao Jiang (Facebook) talked about some of the performance projects he’s worked on including BigPipe, Quickling, and PageCache. This was a great talk covering a lot of material. He was mobbed with questions afterward:
David Wei (also from Facebook) did two talks. The first was about managing static resources – caching, combining scripts, etc. The second was about the challenges of setting up a performance practice and culture within an organization such as Facebook. This is a topic I get asked about frequently and we need a talk like this at Velocity US.
Doug Crockford (Yahoo!) did two talks as well. His first was Using JavaScript Well – a focus on the good parts of JavaScript. The second talk was called There and Back Again – a much needed presentation about the potential of JavaScript running on the server.
Daniel Hunt (also from Yahoo!) did a talk on the performance optimizations behind the Yahoo! Mail rewrite. This also talked about serverside JavaScript and their use of the mustache templating system.
Alex Nicksay (YouTube) described four specific performance improvements made to YouTube. This talk was great because he included quantified load time results for each improvement. He also talked about their UIX widget system when should be open sourced soon. The slides contain almost all the code anyway.
I did two talks. One was a variation of my Even Faster Web Sites presentation with some new slides talking about how script loading has changed dramatically in just the last few years. My other talk was about the arrival of the WPO industry. This is a new talk that includes my view of the main phases of building an industry-wide focus on performance.

A special treat that I was able to pull together at the last minute was presentations from the Chrome, IE9, and Firefox 4 teams. These browser talks are always popular at the Velocity US conference. Web developers understand that having fast browsers is critical to improving the user experience, and these browser teams have responded with tremendous focus on performance the last few years.

The other sessions were arranged by my co-chair Zhang Wen Song (Taobao). The presentations in the main ballroom were translated to English, and I also sat through presentations in Chinese. The translated talks were very good mostly focusing on the operations side of Velocity – availability, scaling, CDNs, and more. I had trouble following the talks that were only in Chinese although slides that had code or charts are universal. But I could tell these were good speakers – they were at ease up on stage, they engaged the audience, and their slides looked good.

Here’s some of the back story around how Velocity China happened: Velocity China is the first conference O’Reilly has done in China. And it sold out! A great start. We held it in Beijing since O’Reilly has an office there. But it’s not a very big office – just four employees headed up by Michelle Chen. This team of four people organized their first conference and pulled off a great event. Huge credit to Michelle and her team. I got to meet all them – Douglas, Donna, and Jian – they were incredible hosts taking us out to dinner every night and arranging drivers and tours.

O’Reilly partnered with Taobao on Velocity China. It was critical to have someone building up the conference program direction and content from China. The co-chair, Zhang Wen Song, arranged the program and also delivered a great opening keynote. Joshua Zhu, also from Taobao, helped with emceeing many of the Chinese sessions.

I want to thank O’Reilly and Taobao for making Velocity China happen. I also want to thank all the people I met there. I was overwhelmed by the enthusiasm of the developers. Even with the language challenges, I had deep conversations during the breaks with attendees. There’s no doubt that performance is a big focus in China. With the success of this conference I’m confident we’ll be back again next year.

10 Comments

Evolution of Script Loading

December 6, 2010 8:06 am | 14 Comments

Velocity China starts tomorrow morning.Â I’m kicking it off with a keynote and am finishing up my slides. Iâ€™ll be talking about progressive enhancement and smart script loading. As I reviewed the slides it struck me how the techniques for loading external scripts have changed significantly in the last few years. Letâ€™s take a look at what we did, where we are now, and try to see what the future might be.

Scripts in the HEAD

Just a few years ago most pages, even for the top websites, loaded their external scripts with HTML in the HEAD of the document:

<head>
<script src=â€core.jsâ€ type=â€text/javascriptâ€></script>
<script src=â€more.jsâ€ type=â€text/javascriptâ€></script>
</head>

Developers in tune with web performance best practices cringe when they see scripts loaded this way. We know that older browsers (now that’s primarily Internet Explorer 6 & 7) load scripts sequentially. The browser downloads core.js and then parses and executes it. Then it does the same with more.js â€“ downloads the script and then parses and executes it.

In addition to loading scripts sequentially, older browsers block other downloads until this sequential script loading phase is completed. This means there may be a significant delay before these browsers even start downloading stylesheets, images, and other resources in the page.

These download issues are mitigated in newer browsers (see my script loading roundup). Starting with IE 8, Firefox 3.6, Chrome 2, and Safari 4 scripts generally get downloaded in parallel with each other as well as with other types of resources. I say â€œgenerallyâ€ because there are still quirks in how browsers perform this parallel script loading. In IE 8 and 9beta, for example, images are blocked from being downloaded during script downloads (see this example). In a more esoteric use case, Firefox 3 loads scripts that are document.written into the page sequentially rather than in parallel.

Then thereâ€™s the issue with rendering being blocked by scripts: any DOM elements below a SCRIPT tag wonâ€™t be rendered until that script is finished loading. This is painful. The browser has already downloaded the HTML document but none of those Ps, DIVs, and ULs get shown to the user if thereâ€™s a SCRIPT above them. If you put the SCRIPT in the document HEAD then the entire page is blocked from rendering.

There are a lot of nuances in how browsers load scripts especially with the older browsers. Once I fully understood the impact scripts had on page loading I came up with my first recommendation to improve script loading:

Move Scripts to the Bottom

Back in 2006 and 2007 when I started researching faster script loading, all browsers had the same problems when it came to loading scripts:

Scripts were loaded sequentially.
Loading a script blocked all other downloads in the page.
Nothing below the SCRIPT was rendered until the script was done loading.

The solution I came up with was to move the SCRIPT tags to the bottom of the page.

...
<script src=â€core.jsâ€ type=â€text/javascriptâ€></script>
<script src=â€more.jsâ€ type=â€text/javascriptâ€></script>
</body>

This isnâ€™t always possible, for example, scripts for ads that do document.write canâ€™t be moved â€“ they have to do their document.write in the exact spot where the ad is supposed to appear. But many scripts can be moved to the bottom of the page with little or no work. The benefits are immediately obvious â€“ images download sooner and the page renders more quickly. This was one of my top recommendations in High Performance Web Sites. Many websites adopted this change and saw the benefits.

Load Scripts Asynchronously

Moving scripts to the bottom of the page avoided some problems, but other script loading issues still existed. During 2008 and 2009 browsers still downloaded scripts sequentially. There was an obvious opportunity here to improve performance. Although it’s true that scripts (often) need to be executed in order, they don’t need to be downloaded in order. They can be downloaded in any order – as long as the browser preserves the original order of execution.

Browser vendors realized this. (I like to think that I had something to do with that.) And newer browsers (starting with IE8, Firefox 3.6, Chrome 2, and Safari 4 as mentioned before) started loading scripts in parallel. But back in 2008 & 2009 sequential script loading was still an issue. I was analyzing MSN.com one day and noticed that their scripts loaded in parallel – even though this was back in the Firefox 2.0 days. They were using the Script DOM Element approach:

var se = document.createElement("script");
se.src = "core.js";
document.getElementsByTagName("head")[0].appendChild(se);

I’ve spent a good part of the last few years researching asynchronous script loading techniques like this. These async techniques (summarized in this blog post with full details in chapter 4 of Even Faster Web Sites) achieve parallel script loading in older browsers and avoid some of the quirks in newer browsers. They also mitigate the issues with blocked rendering: when a script is loaded using an async technique the browser charges ahead and renders the page while the script is being downloaded. This example has a script in the HEAD that’s loaded using the Script DOM Element technique. This script is configured to take 4 seconds to download. When you load the URL you’ll see the page render immediately, proving that rendering proceeds when scripts are loaded asynchronously.

Increased download parallelism and faster rendering – what more could you want? Well…

Async + On-demand Execution

Loading scripts asynchronously speeds up downloads (more parallelism) and rendering. But – when the scripts arrive at the browser rendering stops and the browser UI is locked while the script is parsed and executed. There wouldn’t be any room for improvement here if all that JavaScript was needed immediately, but websites don’t use all the code that’s downloaded – at least not right away. The Alexa US Top 10 websites download an average of 229 kB of JavaScript (compressed) but only execute 29% of those functions by the time the load event fires. The other 71% of code is cruft, conditional blocks, or most likely DHTML and Ajax functionality that aren’t used to render the initial page.

This discovery led to my recommendation to split the initial JavaScript download into the code needed to render the page and the code that can be loaded later for DHTML and Ajax features. (See this blog post or chapter 3 of EFWS.) Websites often load the code that’s needed later in the window’s onload handler. The Gmail Mobile team found wasn’t happy with the UI locking up when that later code arrived at the browser. After all, this DHTML/Ajaxy code might not even be used. They’re the first folks I saw who figured out a way to separate the download phase from the parse-execute phase of script loading. They did this by wrapping all the code in a comment, and then when the code is needed removing the comment delimiters and eval’ing. Gmail’s technique uses iframes so requires changing your existing scripts. Stoyan has gone on to explore using image and object tags to download scripts without the browser executing them, and then doing the parse-execute when the code is needed.

What’s Next?

Web pages are getting more complex. Advanced developers who care about performance need more control over how the page loads. Giving developers the ability to control when an external script is parsed and executed makes sense. Right now it’s way too hard. Luckily, help is on the horizon. Support for LINK REL=”PREFETCH” is growing. This provides a way to download scripts while avoiding parsing and execution. Browser vendors need to make sure the LINK tag has a load event so developers can know whether or not the file has finished downloading. Then the file that’s already in the browser’s cache can be added asynchronously using the Script DOM Element approach or some other technique only when the code is actually needed.

We’re close to having the pieces to easily go to the next level of script loading optimization. Until all the pieces are there, developers on the cutting edge will have to do what they always do – work a little harder and stay on top of the latest best practices. For full control over when scripts are downloaded and when they’re parsed and executed I recommend you take a look at the posts from Gmail Mobile and Stoyan.

Based on the past few years I’m betting there areÂ more improvements to script loading still ahead.

14 Comments

Web Directions South, Sydney photos

October 18, 2010 5:53 pm | 4 Comments

I just got back from Web Directions South in Sydney. John Allsopp and team do a great job. I and many others complimented John on pulling together great speakers at an awesome venue. I especially appreciated the smaller things he did to make a great conference – informal talks during lunch, great coffee and afternoon tea, and an incredible hack day called Amped. About a hundred folks were there on a Saturday creating amazing apps. The winning team, which included Tatham Oddie, built a conference app and also started a website dedicated to a conference data API standard – in one day.

The best talk for me was Tom Hughes-â€‹â€‹Croucher (@sh1mmer) talking about Doing Horrible Things to DNS. I’m not sure why he called it “horrible” – it’s actually a great performance gain. He described a technique for returning resolutions for multiple CNAMEs in a single response. This is a typical scenario for any web sites doing domain sharding. I also enjoyed watching a live release mid-session from Ben Schwarz (@BenSchwarz). He developed the W3 specification styles Greasemonkey script. Working with Michael Smith from the W3C, those styles are being incorporated into the specs themselves.

In addition to great speakers, there was an energetic vibe across all the attendees. The breaks were packed with great conversations and Q&A. I saw more of this JS community vibe the night before the conference when I attended SydJS. Over a hundred JavaScript developers packed the downstairs of the Atlassian offices for a panel of experts.

I spent Sunday sightseeing before flying home today. The sights and people were great. I want to share two photos:

On my walk back from Bondi Beach I passed through a neighborhood with this street sign for Sustainability Street with the added tagline “It’s a village out there.” A nice community boost in the middle of Sydney. At the Botanical Gardens this morning I enjoyed this sign at the entrance directing me to “Please walk on the grass.” I was further invited to “smell the roses, hug the trees, talk to the birds, and picnic on the lawns.” In the bottom right it says “Plants=Life”. It was a great final touch of Sydney before heading home.

4 Comments

Render first. JS second.

September 30, 2010 11:10 am | 28 Comments

Let me start with the takeaway point:

The key to creating a fast user experience in today’s web sites is to render the page as quickly as possible. To achieve this JavaScript loading and execution has to be deferred.

I’m in the middle of several big projects so my blogging rate is down. But I got an email today about asynchronous JavaScript loading and execution. I started to type up my lengthy response and remembered one of those tips for being more productive: “type shorter emails – no one reads long emails anyway”. That just doesn’t resonate with me. I like typing long emails. I love going into the details. But, I agree that an email response that only a few people might read is not the best investment of time. So I’m writing up my response here.

It took me months to research and write the “Loading Scripts Without Blocking” chapter from Even Faster Web Sites. Months for a single chapter! I wasn’t the first person to do async script loading – I noticed it on MSN way before I started that chapter – but that work paid off. There has been more research on async script loading from folks like Google, Facebook and Meebo. Most JavaScript frameworks have async script loading features – two examples are YUI and LABjs. And 8 of today’s Alexa Top 10 US sites use advanced techniques to load scripts without blocking: Google, Facebook, Yahoo!, YouTube, Amazon, Twitter, Craigslist(!), and Bing. Yay!

The downside is – although web sites are doing a better job of downloading scripts without blocking, once those scripts arrive their execution still blocks the page from rendering. Getting the content in front of the user as quickly as possible is the goal. If asynchronous scripts arrive while the page is loading, the browser has to stop rendering in order to parse and execute those scripts. This is the biggest obstacle to creating a fast user experience. I don’t have scientific results that I can cite to substantiate this claim (that’s part of the big projects I’m working on). But anyone who disables JavaScript in their browser can attest that sites feel twice as fast.

My #1 goal right now is to figure out ways that web sites can defer all JavaScript execution until after the page has rendered. Achieving this goal is going to involve advances from multiple camps – changes to browsers, new web development techniques, and new pieces of infrastructure. I’ve been talking this up for a year or so. When I mention this idea these are the typical arguments I hear for why this won’t work:

JavaScript execution is too complex

People point out that: “JavaScript is a powerful language and developers use it in weird, unexpected ways. Eval and document.write create unexpected dependencies. Blanketedly delaying all JavaScript execution is going to break too many web sites.”

In response to this argument I point to Opera’s Delayed Script Execution feature. I encourage you to turn it on, surf around, andÂ try to find a site that breaks. Even sites like Gmail and Facebook work! I’m sure there are some sites that have problems (perhaps that’s why this feature is off by default). But if some sites do have problems, how many sites are we talking about? And what’s the severity of the problems? We definitely don’t want errors, rendering problems, or loss of ad revenue. Even though Opera has had this feature for over two years (!), I haven’t heard much discussion about it. Imagine what could happen if significant resources focused on this problem.

the page content is actually generated by JS execution

Yes, this happens too much IMO. The typical logic is: “We’re building an Ajax app so the code to create and manipulate DOM elements has to be written in JavaScript. We could also write that logic on the serverside (in C++, Java, PHP, Python, etc.), but then we’re maintaining two code bases. Instead, we download everything as JSON and create the DOM in JavaScript on the client – even for the initial page view.”

If this is the architecture for your web app, then it’s true – you’re going to have to download and execute (a boatload) of JavaScript before the user can see the page. “But it’s only for the first page view!” The first page view is the most important one. “People start our web app and then leave it running all day, so they only incur the initial page load once.” Really? Have you verified this? In my experience, users frequently close their tab or browser, or even reboot. “But then at least they have the scripts in their cache.” Even so, the scripts still have to be executed and they’re probably going to have to refetch the JSON responses that include data that could have changed.

Luckily, there’s a strong movement toward server-side JavaScript – see Doug Crockford’s Loopage talk and node.js. This would allow that JavaScript code to run on the server, render the DOM, and serve it as HTML to the browser so that it renders quickly without needing JavaScript. The scripts needed for subsequent dynamic Ajaxy behavior can be downloaded lazily after the page has rendered. I expect we’ll see more solutions to address this particular problem of serving the entire page as HTML, even parts that are rendered using JavaScript

doing this actually makes the page slower

The crux of this argument centers around how you define “slower”. The old school way of measuring performance is to look at how long it takes for the window’s load event to fire. (It’s so cool to call onload “old school”, but a whole series of blog posts on better ways to measure performance is called for.)

It is true that delaying script execution could result in larger onload times. Although delaying scripts until after onload is one solution to this problem, it’s more important to talk about performance yardsticks. These days I focus on how quickly the critical page content renders. Many of Amazon’s pages, for example, have a lot of content and consequently can have a large onload time, but they do an amazing job of getting the content above-the-fold to render quickly.

WebPagetest.org‘s video comparison capabilities help hammer this home. Take a look at this video of Techcrunch, Mashable, and a few other tech news sites loading. This forces you to look at it from the user’s perspective. It doesn’t really matter when the load event fired, what matters is when you can see the page. Reducing render time is more in tune with creating a faster user experience, and delaying JavaScript is key to reducing render time.

What are the next steps?

Browsers should look at Opera’s behavior and implement the SCRIPT ASYNC and DEFER attributes.
Developers should adopt asynchronous script loading techniques and avoid rendering the initial page view with JavaScript on the client.
Third party snippet providers, most notably ads, need to move away from document.write.

Rendering first and executing JavaScript second is the key to a faster Web.

28 Comments

newtwitter performance analysis

September 22, 2010 10:51 pm | 15 Comments

Among the exciting launches last week was newtwitter – the amazing revamp of the Twitter UI. I use Twitter a lot and was all over the new release, and so was stoked to see this tweet from Twitter developer Ben Cherry:

The new Twitter UI looks good, but how does it score when it comes to performance? I spent a few hours investigating. I always start with HTTP waterfall charts, typically generated by HttpWatch. I look at Firefox and IE because they’re the most popular browsers (and I use a lot of Firefox tools). Here’s the waterfall chart for Firefox 3.6.10:

Twitter HTTP Waterfall in Firefox 3.6.10

I used to look at IE7 but its market share is dropping, so now I start with IE8. Here’s the waterfall chart for IE8:

From the waterfall charts I generate a summary to get an idea of the overall page size and potential for problems.

	Firefox 3.6.10	IE8
main content rendered	~4 secs	~5 secs
window onload	~2 secs	~7 secs
network activity done	~5 secs	~7 secs
# requests	53	52
total bytes downloaded	428 kB	442 kB
JS bytes downloaded	181 kB	181 kB
CSS bytes downloaded	21 kB	21 kB

I study the waterfall charts looking for problems, primarily focused on places where parallel downloading stops and where there are white gaps. Then I run Page Speed and YSlow for more recommendations. Overall newtwitter does well, scoring 90 on Page Speed and 86 on YSlow. Combining all of this investigation results in this list of performance suggestions:

Script Loading – The most important thing to do for performance is get JavaScript out of the way. “Out of the way” has two parts:
- blocking downloads – Twitter is now using the new Google Analytics async snippet (yay!) so ga.js isn’t blocking. However, base.bundle.js is loaded using normal SCRIPT SRC, so it blocks. In newer browsers there will be some parallel downloads, but even IE9 will block images until the script is done downloading. In both Firefox and IE it’s clear this is causing a break in parallel downloads. phoenix.bundle.js and api.bundle.js are loaded using LABjs in conservative mode. In both Firefox and IE there’s a big block after these two scripts. It could be LABjs missing an opportunity, or it could be that the JavaScript is executing. But it’s worth investigating why all the resources lower in the page (~40 of them) don’t start downloading until after these scripts are fetched. It’d be better to get more parallelized downloads.
- blocked rendering – The main part of the page takes 4-5 seconds to render. In some cases this is because the browser won’t render anything until all the JavaScript is downloaded. However, in this page the bulk of the content is generated by JS. (I concluded this after seeing that many of the image resources are specified in JS.) Rather than download a bunch of scripts to dynamically create the DOM, it’d be better to do this on the server side as HTML as part of the main HTML document. This can be a lot of work, but the page will never be fast for users if they have to wait for JavaScript to draw the main content in the page. After the HTML is rendered, the scripts can be downloaded in the background to attach dynamic behavior to the page.
scattered inline scripts – Fewer inline scripts are better. The main reason is that a stylesheet followed by an inline script blocks subsequent downloads. (See Positioning Inline Scripts.) In this page, phoenix.bundle.css is followed by the Google Analytics inline script. This will cause the resources below that point to be blocked – in this case the images. It’d be better to move the GA snippet to the SCRIPT tag right above the stylesheet.
Optimize Images – This one image alone could be optimized to save over 40K (out of 52K): http://s.twimg.com/a/1285108869/phoenix/img/tweet-dogear.png.
Expires header is missing – Strangely, some images are missing both the Expires and Cache-Control headers, for example, http://a2.twimg.com/profile_images/30575362/48_normal.png. My guess is this is an origin server push problem.
Cache-Control header is missing – Scripts from Amazon S3 have an Expires header, but no Cache-Control header (eg http://a2.twimg.com/a/1285097693/javascripts/base.bundle.js). This isn’t terrible, but it’d be good to include Cache-Control: max-age. The reason is that Cache-Control: max-age is relative (“# of seconds from right now”) whereas Expires is absolute (“Wed, 21 Sep 2011 20:44:22 GMT”). If the client has a skewed clock the actual cache time could be different than expected. In reality this happens infrequently.
redirect – http://www.twitter.com/ redirects to http://twitter.com/. I love using twitter.com instead of www.twitter.com, but for people who use “www.” it would be better not to redirect them. You can check your logs and see how often this happens. If it’s small (< 1% of unique users per day) then it’s not a big problem.
two spinners – Could one of these spinners be eliminated: http://twitter.com/images/spinner.gif and http://twitter.com/phoenix/img/loader.gif ?
mini before profile – In IE the “mini” images are downloaded before the normal “profile” images. I think the profile images are more important, and I wonder why the order in IE is different than Firefox.
CSS cleanup – Page Speed reports that there are > 70 very inefficient CSS selectors.
Minify – Minifying the HTML document would save ~5K. Not big, but it’s three roundtrips for people on slow connections.

Overall newtwitter is beautiful and fast. Most of the Top 10 web sites are using advanced techniques for loading JavaScript. That’s the key to making today’s web apps fast.

15 Comments

Twitter vs blogging

September 15, 2010 2:21 pm | 8 Comments

My rate of blogging has dropped dramatically since I saw how @dalmaer was able to get so much news out via Twitter. I was slow to adopt Twitter, but now I love it. If you follow my blog but aren’t following me on Twitter, you should start following me: @souders. Here are some of the important tweets I’ve made in the last few days:

IE 9 Beta is now available in http://WebPagetest.org – let’s hear it for @patmeenan!
IE blog about need realistic benchmarks http://bit.ly/a3WR1c; Mozilla announces “Kraken focuses on realistic workloads” http://bit.ly/9wMGYn
VCs tend to back companies with fast web sites (via @joshuabixby) – http://bit.ly/aLXaIf
Do you use <meta name=”viewport” content=”width=device-width”> in your mobile web app? You should (via @ppk). http://bit.ly/9IuhzT
Browserscope shows IE9 beta’s network behavior is improved. I’m surprised scripts & images can’t download in parallel. http://bit.ly/dpileW
IE6 is now available on http://WebPagetest.org for Dulles, VA. Awesome work by Pat Meenan.
Great survey of how YSlow score relates to page load time from Yottaa: http://blog.yottaa.com/2010/09/how-important-is-my-yslow-score/
be careful using Web Inspector for performance analysis; JS execution can cause network events to be mistimed; Speed Tracer has same problem
Had to file a FF bug and forgot the URL. Then remembered this Browser Resources page from Browserscope: http://www.browserscope.org/browsers
About to start the first meeting of the W3C Web Performance Working Group: http://www.w3.org/2010/webperf/

You get the idea. I won’t do this blog-retweeting again, but if you care about web performance I hope you’ll follow me on Twitter.

@souders

8 Comments