Comparing RUM & Synthetic Page Load Times
Yesterday I read Etsy’s October 2012 Site Performance Report. Etsy is one of only a handful of companies that publish their performance stats with explanations and future plans. It’s really valuable (and brave!), and gives other developers an opportunity to learn from an industry leader. In this article Etsy mentions that the page load time stats are gathered from a private instance of WebPagetest. They explain their use of synthetically-generated measurements instead of RUM (Real User Monitoring) data:
You might be surprised that we are using synthetic tests for this front-end report instead of Real User Monitoring (RUM) data. RUM is a big part of performance monitoring at Etsy, but when we are looking at trends in front-end performance over time, synthetic testing allows us to eliminate much of the network variability that is inherent in real user data. This helps us tie performance regressions to specific code changes, and get a more stable view of performance overall.
Etsy’s choice of synthetic data for tracking performance as part of their automated build process totally makes sense. I’ve talked to many companies that do the same thing. Teams dealing with builds and code regressions should definitely do this. BUT… it’s important to include RUM data when sharing performance measurements beyond the internal devops team.
Why should RUM data always be used when talking beyond the core team?
The issue with only showing synthetic data is that it typically makes a website appear much faster than it actually is. This has been true since I first started tracking real user metrics back in 2004. My rule-of-thumb is that your real users are experiencing page load times that are twice as long as their corresponding synthetic measurements.
RUM data, by definition, is from real users. It is the ground truth for what users are experiencing. Synthetic data, even when generated using real browsers over a real network, can never match the diversity of performance variables that exist in the real world: browsers, mobile devices, geo locations, network conditions, user accounts, page view flow, etc. The reason we use synthetic data is that it allows us to create a consistent testing environment by eliminating the variables. The variables we choose for synthetic testing matches a segment of users (hopefully) but it can’t capture the diversity of users that actually visit our websites every day. That’s what RUM is for.
The core team is likely aware of the biases and assumptions that come with synthetic data. They know that it was generated using only laptops and doesn’t include any mobile devices; that it used a simulated LAN connection and not a slower DSL connection; that IE 9 was used and IE 6&7 aren’t included. Heck, they probably specified these test conditions. The problem is that the people outside the team who see the (rosy) synthetic metrics aren’t aware of these caveats. Even if you note these caveats on your slides, they still won’t remember them! What they will remember is that you said the page loaded in 4 seconds, when in reality most users are getting a time closer to 8 seconds.
How different are RUM measurements as compared to synthetic?
As I said a minute ago, my rule-of-thumb is that RUM page load times are typically 2x what you see from synthetic measurements. After my comment on the Etsy blog post about adding RUM data and a tweet from @jkowall asking for data comparing RUM to synthetic less than 24 hours later, I decided to gather some real data from my website.
Similar to Etsy, I used WebPagetest to generate synthetic measurements. I chose a single URL: http://www.stevesouders.com/blog/2012/10/11/cache-is-king/. I measured it using a simulated DSL connection in Chrome 23, Firefox 16, and IE 9. I measured both First View (empty cache) and Repeat View (primed cache). I did three page loads and chose the median. My RUM data came from Google Analytics’ Site Speed feature over the last month. As shown in this chart of the page load time results, the RUM page load times are 2-3x slower than the synthetic measurements.
There’s some devil in the details. The synthetic data could have been more representative: I could have done more than three page loads, tried different network conditions, and even chosen different geo locations. The biggest challenge was mixing the First View and Repeat View page load times to compare to RUM. The RUM data contains both empty cache and primed cache page views, but the split is unknown. A study Tenni Theurer and I did in 2007 showed that ~80% of page views are done with a primed cache. To be more conservative I averaged the First View and Repeat View measurements and call that “Synth 50/50″ in the chart. The following table contains the raw data:
|Chrome 23||Firefox 16||IE 9|
|Synthetic First View (secs)||4.64||4.18||4.56|
|Synthetic Repeat View (secs)||2.08||2.42||1.86|
|Synthetic 50/50 (secs)||3.36||3.30||3.21|
|RUM data points||94||603||89|
In my experience these results showing RUM page load times being much slower than synthetic measurements are typical. I’d love to hear from other website owners about how their RUM and synthetic measurements compare. In the meantime, be cautious about only showing your synthetic page load times – the real user experience is likely quite a bit slower.