Silk, iPad, Galaxy comparison

December 1, 2011 9:51 am | 11 Comments

In my previous blog post I announced Loadtimer – a mobile test harness for measuring page load times. I was motivated to create Loadtimer because recent reviews of the Kindle Fire lacked the quantified data and reliable test procedures needed to compare browser performance.

Most performance evaluations of Silk that have come out since its launch have two conclusions:

  1. Silk is faster when acceleration is turned off.
  2. Silk is slow compared to other tablets.

Let’s poke at those more rigorously using Loadtimer.

Test Description

In this test I’m going to compare the following tablets: Kindle Fire (with acceleration on and off), iPad 1, iPad 2, Galaxy 7.0, and Galaxy 10.1.

The test is based on how long it takes for web pages to load on each device. I picked 11 URLs that are top US websites:

Some popular choices (Google, YouTube, and Twitter) weren’t selected because they have framebusting code and so don’t work in Loadtimer’s iframe-based test harness.

The set of 11 URLs were loaded 9 times on each device. The set of URLs was randomized for each run. All the tests were conducted on my home wifi over a Comcast cable modem. (Check out this photo of my test setup.) All the tests were done at the same time of day over a 3 hour period. I did one test at a time to avoid bandwidth contention, and rotated through the devices doing one run at a time. I cleared the cache between each run.

Apples and Oranges

The median page load time for each URL on each device is shown in the Loadtimer Results page. It’s a bit complicated to digest. The fastest load time is shown in green and the slowest is red – that’s easy. The main complication is that not every device got the same version of a given URL. Cells in the table that are shaded with a gray background were cases where the device received a mobile version of the URL. Typically (but not always) the mobile version is lighter than the desktop version (fewer requests, fewer bytes, less JavaScript, etc.) so it’s not valid to do a heads up comparison of page load times between desktop and mobile versions.

Out of 11 URLs, the Galaxy 7.0 received 6 that were mobile versions. The Galaxy 10.1 and Silk each received 2 mobile versions, and the iPads each had only one mobile version across the 11 URLs.

In order to gauge the difference between the desktop and mobile versions, the results table shows the number of resources in each page. eBay, for example, had 64 resources in the desktop version, but only 18-22 in the mobile version. Not surprisingly, the three tablets that received the lighter mobile version had the fastest page load times. (If a mobile version was faster than the fastest desktop version, I show it in non-bolded green with a gray background.)

This demonstrates the importance of looking at the context of what’s being tested. In the comparisons below we’ll make sure to keep the desktop vs mobile issue in mind.

Silk vs Silk

Let’s start making some comparisons. The results table is complicated when all 6 rows are viewed. The checkboxes are useful for making more focused comparisons. The Silk (accel off) and Silk (accel on) results show that indeed Silk performed better with acceleration turned off for every URL. This is surprising, but there are some things to note.

First, this is the first version of Silk. Jon Jenkins, Director of Software Development for Silk, spoke at Velocity Europe a few weeks back. In his presentation he shows different places where the split in Silk’s split architecture could happen (slides 26-28). He also talked about the various types of optimizations that are part of the acceleration. Although he didn’t give specifics, it’s unlikely that all of those architectural pieces and performance optimizations have been deployed in this first version of Silk. The test results show that some of the obvious optimizations, such as concatenating scripts, aren’t happening when acceleration is on. I expect we’ll see more optimizations rolled out during the Silk release cycle, just as we do for other browsers.

A smaller but still important issue is that although the browser cache was cleared between tests, the DNS cache wasn’t cleared. When acceleration is on there’s only one DNS lookup needed – the one to Amazon’s server. When acceleration is off Silk has to do a DNS lookup for every unique domain – an average of 13 domains per page. Having all of those DNS lookups cached gives an unfair advantage to the “acceleration off” page load times.

I’m still optimistic about the performance gains we’ll see as Silk’s split architecture matures, but for the remainder of this comparison we’ll use Silk with acceleration off since that performed best.

Silk vs iPad

I had both an iPad 1 and iPad 2 at my disposal so included both in the study. The iPad 1 was the slowest across all 11 URLs so I restricted the comparison to Silk (accel off) and iPad 2.

The results are mixed with iPad 2 being faster for most but not all URLs. The iPad 2 is fastest in 7 URLs. Silk is fastest in 3 URLs. One URL (eBay) is apples and oranges since Silk gets a mobile version of the site (18 resources compared to 64 resources for the desktop version).

Silk vs Galaxy

Comparing the Galaxy 7.0 to any other tablet is not fair since Galaxy 7.0 receives a lighter mobile version in 6 of 11 URLs. The Galaxy 7.0 has the slowest page load time in 3 of the 4 URLs where it, Galaxy 10.1, and Silk all receive the desktop version. Since it’s slower head-to-head and has mobile versions in the other URLs, I’ll focus on comparing Silk to the Galaxy 10.1.

Silk has the fastest page load time in 7 URLs. The Galaxy 10.1 is faster in 3 URLs. One URL is mixed as Silk gets a mobile version (18 resources) while the Galaxy 10.1 gets a desktop version (64 resources).

 Takeaways

These results show that, as strange as it might sound, Silk appears to be faster when acceleration is turned off. Am I going to turn off acceleration on my Kindle Fire? No. I don’t want to miss out on the next wave of performance optimizations in Silk. The browser is sound. It holds its own compared to other tablet browsers. Once the acceleration gets sorted out I expect it’ll do even better.

More importantly, it’s nice to have some real data and to have Loadtimer to help with future comparisons. Doing these comparisons to see which browser/tablet/phone is fastest makes for entertaining reading and heated competition. But all of us should expect more scientific rigor in the reviews we read, and push authors and ourselves to build and use better tools for measuring performance. I hope Loadtimer is useful. Loadtimer plus pcapperf and the Mobile Perf bookmarklet are the start of a mobile performance toolkit. Between the three of them I’m able to do most of what I need for analyzing mobile performance. It’s still a little clunky, but just as it happened in the desktop world we’ll see better tools with increasingly powerful features across more platforms as the industry matures. It’s still early days.

 

11 Comments

Loadtimer: a mobile test harness

December 1, 2011 1:35 am | 14 Comments

Measuring mobile performance is hard

When Amazon announced their Silk browser I got excited reading about the “split architecture”. I’m not big on ereaders but I pre-ordered my Kindle Fire that day. It arrived a week or two ago. I’ve been playing with it trying to find a scientific way to measure page load times for various websites. It’s not easy.

  • Since it’s a new browser and runs on a tablet we don’t have plugins like Firebug.
  • It doesn’t (yet) support the Navigation Timing spec, so even though I can inspect pages using Firebug Lite (via the Mobile Perf bookmarklet) and Weinre (I haven’t tried it but I assume it works), there’s no page load time value to extract.
  • Connecting my Fire to a wifi hotspot on my laptop running tcpdump (the technique evangelized by pcapperf) doesn’t work in accelerated mode because Silk uses SPDY over SSL. This technique works when acceleration is turned off, but I want to see the performance optimizations.

While I was poking at this problem a bunch of Kindle Fire reviews came out. Most of them talked about the performance of Silk, but I was disappointed by the lack of scientific rigor in the testing. Instead of data there were subjective statements like “the iPad took about half as long [compared to Silk]” and “the Fire routinely got beat in rendering pages but often not by much”. Most of the articles did not include a description of the test procedures. I contacted one of the authors who confided that they used a stopwatch to measure page load times.

If we’re going to critique Silk and compare its performance to other browsers we need reproducible, unbiased techniques for testing performance. Using a stopwatch or loading pages side-by-side and doing a visual comparison to determine which is faster are not reliable methods for measuring performance. We need better tools.

Introducing Loadtimer

Anyone doing mobile web development knows that dev tools for mobile are lacking. Firebug came out in 2006. We’re getting close to having that kind of functionality in mobile browsers using remote debuggers, but it’s pretty safe to say the state of mobile dev tools is 3-5 years behind desktop tools. It might not be sexy, but there’s a lot to be gained from taking tools and techniques that worked on the desktop and moving them to mobile.

In that vein I’ve been working the last few days to build an iframe-based test harness similar to one I built back in 2003. I call it Loadtimer. (I was shocked to see this domain was available – that’s a first.) Here’s a screenshot:

The way it works is straightforward:

  • It’s preloaded with a list of popular URLs. The list of URLs can be modified.
  • The URLs are loaded one-at-a-time into the iframe lower in the page.
  • The iframe’s onload time is measured and displayed on the right next to each URL.
  • If you check “record load times” the page load time is beaconed to the specified URL. The beacon URL defaults to point to loadtimer.org, but you can modify it if, for example, you’re testing some private pages and want the results to go to your own server.
  • You can’t test websites that have “framebusting” code that prevents them from being loaded in an iframe, such as Google, YouTube, Twitter, and NYTimes.

There are some subtle optimizations worth noting:

  • You should clear the cache between each run (unless you explicitly want to test the primed cache experience). There’s no way for the test harness to clear the cache, but it does have a check that helps remind you to clear the cache. (It loads a script that is known to take 3 seconds to load – if it takes less than 3 seconds it means the cache wasn’t cleared.)
  • It’s possible that URL 1′s unload time could make URL 2′s onload time be longer than it actually should be. To avoid this about:blank is loaded between each URL.
  • The order of the preset URLs is randomized to mitigate biases across URLs, for example, where URL 1 loads resources used by URL 2.

Two biases that aren’t addressed by Loadtimer:

  • DNS resolutions aren’t cleared. I don’t think there’s a way to do this on mobile devices short of power cycling. This could be a significant issue when comparing Silk with acceleration on and off. When acceleration is on there’s only one DNS lookup, whereas when acceleration is off there’s a DNS lookup for each hostname in the page (13 domains per page on average). Having the DNS resolutions cached gives an advantage to acceleration being off.
  • Favicons aren’t loaded for websites in iframes. This probably has a negligible impact on page load times.

 Have at it

The nice thing about the Loadtimer test harness is that it’s web-based – nothing to install. This ensures it’ll work on all mobile phones and tablets that support JavaScript. The code is open source. There’s a forum for questions and discussions.

There’s also a results page. If you select the “record load times” checkbox you’ll be helping out by contributing to the crowdsourced data that’s being gathered. Getting back to what started all of this, I’ve also been using Loadtimer the last few days to compare the performance of Silk to other tablets. Those results are the topic of my next blog post – see you there.

 

14 Comments