ControlJS part 3: overriding document.write

December 15, 2010 11:18 pm | 22 Comments

This is the third of three blog posts about ControlJS – a JavaScript module for making scripts load faster. The three blog posts describe how ControlJS is used for async loading, delayed execution, and overriding document.write.

The goal of ControlJS is to give developers more control over how JavaScript is loaded. The key insight is to recognize that “loading” has two phases: download (fetching the bytes) and execution (including parsing). In ControlJS part 1 I focused on the download phase using ControlJS to download scripts asynchronously. ControlJS part 2 showed how delaying script execution makes pages load faster, especially for Ajax web apps that download a lot of JavaScript that’s not used immediately. In this post I look at how document.write causes issues when loading scripts asynchronously.

I like ControlJS, but I’ll admit now that I was unable to solve the document.write problem to my satisfaction. Nevertheless, I describe the dent I put in the problem of document.write and point out two alternatives that do a wonderful job of making async work in spite of document.write.

Async and document.write

ControlJS loads scripts asynchronously. By default these async scripts are executed after window onload. If one of these async scripts calls document.write the page is erased. That’s because calling document.write after the document is loaded automatically calls document.open. Anything written to the open document replaces the existing document content. Obviously, this is not the desired behavior.

I wanted ControlJS to work on all scripts in the page. Scripts for ads are notorious for using document.write, but I also found scripts created by website owners that use document.write. I didn’t want those scripts to be inadvertently loaded with ControlJS and have the page get wiped out.

The problem was made slightly easier for me because ControlJS is in control when each inline and external script is being executed. My solution is to override document.write and capture the output for each script one at a time. If there’s no document.write output then proceed to the next script. If there is output then I create an element (for now a SPAN), insert it into the DOM immediately before the SCRIPT tag that’s being executed, and set the SPAN’s innerHTML to the output from document.write.

This works pretty well. I downloaded a local copy of CNN.com and converted it to ControlJS. There were two places where document.write was used (including an ad) and both worked fine. But there are issues. One issue I encountered is that wrapping the output in a SPAN can break certain behavior like hover handlers. But a bigger issue is that browsers ignore “<script…” added via innerHTML, and one of the main ways that ads use document.write is to insert scripts. I have a weak solution to this that parses out the “<script…” HTML and creates a SCRIPT DOM element with the appropriate SRC. It works, but is not very robust.

ControlJS is still valuable even without a robust answer for document.write. You don’t have to convert every script in the page to use ControlJS. If you know an ad or widget uses document.write you can load those scripts the normal way. Or you can test and see how complex the document.write is – perhaps it’s a use cases that ControlJS handles.

Real solutions for async document.write

I can’t end this series of posts without highlighting two alternatives that have solved the asynchronous document.write issue: Aptimize and GhostWriter.

Aptimize sells a web accelerator that changes HTML markup on the fly to have better performance. In September they announced the capability to load scripts asynchronously – including ads. As far as I know this is the first product that offered this feature. I had lunch with Ed and Derek from Aptimize this week and found out more about their implementation. It sounds solid.

Another solution is GhostWriter from Digital Fulcrum. Will contacted me and we had a concall and a few email exchanges about his implementation that uses an HTML parser on the document.write output. You can try the code for free.

Wrapping it up

ControlJS has many features including some that aren’t found in other script loading modules:

  • downloads scripts asynchronously
  • handles both inline scripts and external scripts
  • delays script execution until after the page has rendered
  • allows for scripts to be downloaded and not executed
  • integrates with simple changes to HTML – no code changes
  • solves some document.write async use cases
  • control.js itself is loaded asynchronously

ControlJS is open source. I hope some of these features are borrowed and added to other script loaders. There’s still more features to implement (async stylesheets, better document.write handling) and ControlJS has had minimal testing. If you’d like to add features or fix bugs please take a look at the ControlJS project on Google Code and contact the group via the ControlJS discussion list.

22 Responses to ControlJS part 3: overriding document.write

  1. Actually, Catchpoint just announced a service to asynchronously load 3rd party scripts (http://blog.catchpoint.com/2010/12/07/outclip-l/). I’m not sure how they solved the document.write() problem. There is an unanswered question about it on that blog post.

    Nice series! Can’t wait to play with the code tomorrow!

  2. the question to catchpoint was from me, and they answered me privately that it’s something they prefer to keep secret (for commercial reasons I guess) while confirming that dynamic iframes is the key.

    I remember having tried dynamic iframes a long time ago, and like other solutions it works, but not with all the ads. So the question is still : how to have something bullet proof ?

  3. Hey Steve,

    Great articles. I’ve been looking at the document.write issue for my current contract and unfortunately I hit two big issues:

    1. document.write, writing a script tage with a nested document.write in it within the ad code. I don’t know if you’ll be able to see this, but here is an example: http://bid.openx.net/jstag I’ve encountered > 4 deep in some cases.

    2. This is the real deal killer atm, the script tag is written across multiple document.write function calls again, don’t know if you’ll be able to see this, but here’s an example: http://view.atdmt.com/MOE/jview/284497587/direct/01/657966404?click=http://oas.autotrader.co.uk/5c/cars/usedform/L13/657966404/Middle/Autotrader/ford_sara_mpu_dec10/ford_sara_mpu_dec10.html/77664f43455579396f6f594141314535?

    in case you can’t see it an example would be:
    document.write(”);

    I mean I’m considering testing out some type of html validation for the document.write code and if it’s not valid caching the string, wait for the next document.write and trying again, but this will more than likely end up being slower than just letting document.write do its thing.

  4. Sorry forgot to use HTML entities in the second example here it is again:
    document.write(‘<scr’ + ‘ipt src=”‘);
    document.write(‘&some_ad_crap=’ + foo);
    document.write(‘some_more_ad_crap=’ + bar);
    document.write(‘even_more_ad_crap=’Math.random() * 10000000);
    document.write(‘”></scr’ + ‘ipt>’);

  5. Sadly, a lot of third-party content that requires document.write() doesn’t allow you to tinker with their code, Google AdSense being an example. I wonder if using ControlJS in these cases would be considered a violation of the terms and conditions.

  6. A great post Steve!

    One small comment – AcceloWeb has had a document.write() solution for about a year and a half now.
    It has been tested and is deployed live on multiple websites. AcceloWeb’s safe script postponing feature works out-of-the-box on more than 90% of all scripts (including ones with document.write()) and with a bit of configuration, it has worked on every web page we’ve ever encountered!

    Leonid Fainberg
    Co-Founder and CTO
    AcceloWeb

  7. @Mathias has a valid point on the violation of Terms of Service of ad providers.
    Please comment if you have experience with modifying ad code with document.write and discussing this with the ad code provider/creator. Do they listen and care about the performance argument? Or is it a simple ‘hey, use our code or we kick you out’?

    @Leonid care to share part of that solution? I know you are a commercial service provider, but imo, Acceloweb can use some attention from the industry, so maybe sharing is a good way to accomplish that…

  8. As far as the implementation goes you could output the spans with ID’s that are array keys to the captured document.write content, don’t output the content right away.

    Once the page has loaded, loop through the captured document.write array and use dom methods to createElement2 and replaceChild for each element in the array. (https://developer.mozilla.org/En/DOM/Node.replaceChild)

    This way you don’t end up with the extra spans in the output.

  9. @christos Those cases are quite common as network ad delivery becomes more prevalent and clients are bounced between multiple ad servers. Beyond that, the fact that inline scripts, when document.write()’en, execute synchronously and elements are immediately available make this even more difficult.

    This, for example, will work:

    document.write("<div id=foo><\/div>");
    document.write("<script> var _foo= document.getElementById("foo");<\/script>");
    typeof foo == ‘object’ && foo === _foo;

    The global `foo` is available since all #ids are available as global variables (legacy behavior, firefox will warn, but still in use). In addition, the inline script executed synchronously and therefore `_foo` is initialized before controls returns to our calling script

    GhostWriter’s architecture allows for this since it emulates the browser’s HTML parser as closely as possible. Not only that, but because it’s able to inspect elements as they are embedded it can do some cool things like monitoring availability, measuring performance and skip over certain elements.

    It’s true that document.write() is a decidedly un-elegant solution: First and foremost its post-load behavior is astonishing, especially to newcomers. Second, it’s always difficult to create a large string no matter the language (think about all the SQL you’ve had to write in PHP and how grateful you were when an ORM came to the rescue). Why is it still so prevalent? Simply put: document.write() provides functionality that cannot be emulated using DOM methods.

    Yes, this may raise the ire of many but it’s not untrue. Why do ads make such prevalent use of document.write()? Because without relying on a second callback, it is the most reliable way a widget can “know” where the HTML author intends for it to be placed. If the goal is to make widget placement simple for the HTML author, there’s nothing simpler than: Put this script wherever you want the widget to go on the page. For ads, this is a requirement. The spec provides no mechanism to allow a script to “know” where it’s tag is — nor should it — so scripts that want to be polite are forced to use a technique like this:

    var map= document.getElementsByTagName(‘script’),
    me= map[map.length – 1],
    div= document.createElement(‘div’)
    ;
    me.parentNode.insertBefore(div,me);
    window.onload= function)(){ div.innerHTML= “put creative in here”; }

    This script assumes it’s been included inline and, if you’re able to make that assumption, why not just use document.write() to embed that DIV? It’s much more more performant.

    Another reason document.write is so widely used is it is still the most reliable way HTML can be embedded using a script.
    The `innerHTML` property does not provide sufficient parsing for script tags (as Steve points out) and `innerHTML` still encourages developers to write scripts that create Strings, one of the reasons document.write is considered poor form.

    The “real” way to create elements, DOM methods, unfortunately don’t work as expected (primarily in Internet Explorer). If you’ve ever tried to create a SWF using `createElement` you know what I mean. Even the excellent SWFObject has to use innerHTML in IE. Moreover, IE exhibits buggy behavior for other elements ranging from minor (the element doesn’t work properly) to catastrophic (browser crash).

    Finally, document.write() provides the most consistent mechanism to parallel download remote scripts and still execute them in order. While HTML5 makes some attempt to standardize how this can be done dynamically, IMO, it doesn’t go far enough and global synchronized execution queues and the like eventually leads us back to where we are now.

    In short, if you’ve got HTML, there is no easier (and in some cases no other) way to embed it using JavaScript. This is the unfortunate reality of the situation. iFrames simply do not provide the same functionality and neither do the DOM methods.

  10. Aragon product has a similar solution which work 100% in complex web pages which was expermented more than a year ago.

  11. The biggest issue with `document.write` is that it encourages synchronous programming. No script should have the right to control the page before it’s ready. The ideal way is to give a stream to a 3rd party script and say: “This is your sandbox.”

    You can’t do it safely today (although there are efforts like Caja and AdSafe).

    The first rule of thumb is still that you should only use 3rd party scripts you trust.

    Now, instead of `document.write` ads the following form would be better:

    <div id=”ex_34286351″></div>

    <div id=”ex_34286352″></div>

    <script src=”ex_ad.js”></script>
    var ex_client = “pub-3428635”;
    load(“ad.js”, function () {
    ex_create_ad(“34286351”, 300, 250);
    ex_create_ad(“34286352”, 728, 29);
    });
    ​

  12. In the example, it shows that the ad script can be loaded either the normal way or async. The two cases were merged into one in a wrong way. :)

  13. @christos: Yea, like I said, there are many edge cases that my simple approach doesn’t handle. I wonder if the commercial solutions do?

    @Leonid: Sorry I didn’t give you due credit! Please provide a link to info on how you solve this problem.

    @Hardy: To clarify, the overriding of document.write happens in the execute phase which is after the page has loaded. And the SPAN is only added if it’s needed.

    @Will: Wow, thanks for the detailed feedback.

  14. @balazas I noticed Google ads/doubleclick can already load like that when I look at the source of http://osnews.com/ I see:

    – load .js in the head
    – GS_googleAddAdSenseService (‘id’);
    – GS_googleEnableAllServices();
    – GA_googleAddSlot(‘id’, ‘name’);
    – GA_googleFetchAds();
    – GA_googleFillSlot(‘name’);

  15. Steve,

    I have been following your blog for several months, thanks for your eye opening articles!

    If you don’t mind, I have a question related to a specific point of your article: how can you locate the SCRIPT tag that’s being executed, to insert the span tag? I am mainly concerned about asynchronously loaded scripts.

    For the record, I asked this question on stackoverflow last month:
    http://stackoverflow.com/questions/4172180/recommended-method-to-locate-the-current-script

  16. @Christophe: In my case, ControlJS is iterating over each script element (getElementsByTagName(‘script)). If the script has type=”text/cjs” then the document.write override is applied. Your situation is different – your script’s execution is kicked off by the browser. Of the alternatives you list in stackoverflow adding an id to the script is best.

  17. Great articles Steve!

    Would like to add that non-commercial document.write hacks exists. Probably not as robust as the commercial solutions, but might be worth checking out. ExtSrcJS has been out for a few months, and I’m working on one with a bit different approach. Have tried to document some of the gotchas here:
    https://github.com/gregersrygg/crapLoader/wiki/What-to-think-about-when-replacing-document.write

    Contributions are welcome :)

    ExtSrcJS is more similar to ControlJS in that it uses a script tag with a different type attribute. Then starts to load the scripts after the page has loaded.
    http://code.google.com/p/extsrcjs/

  18. Looking at the source I noticed typeof was used like a function when it’s not.

    typeof foo === ‘undefined’
    not
    typeof(foo) === ‘undefined’

  19. Steve, thanks for the advice. What still bothers me with “id” is that the script is widget type, and the user can install several in the page. It should still work, but creates invalid markup.
    I am definitely interested in your approach here (custom type), it was not in my list simply because I didn’t even think about it! I’ll give it a try (the application I work with is SharePoint btw).

    Thanks again!

  20. Hi

    I was just wondering, have you tried inserting the document.write content in a div instead of a span? It struck me that some of the problems you’re having could be due to inserting the content into an inline element rather than a block-level element.

    Cheers

  21. How does Control.JS compare with head.js (headjs.com)?

  22. Dan: The main features of ControlJS are listed at http://controljs.org. A good place to start a comparison is to see if the alternatives (in this case, head.js) has those features.