January 2009

User Agents in the morning

Every working day, a script runs at 7am that opens ~20 websites in my browser. I open them at 7am so that they’re ready for me when I sit down with my coffee. I’m the performance guy – I can’t stand waiting for a page to load. Among the sites that I read everyday are blogs (Ajaxian, O’Reilly Radar, Google Reader for the rest), news sites (MarketWatch, CNET Tech News, InternetNews, TheStreet.com), and stuff for fun and life (Dilbert, Woot, The Big Picture, Netflix).

The last site is a page related to UA Profiler. It lists all the new user agents that have been tested in the last day. These are unique user agents – they’ve never been seen by UA Profiler before. When I first launched UA Profiler, there were about 50 each day. Now, it’s down to about 20 per day. But I’ve skipped over the main point.

Why do I review these new user agents every morning?

When I started UA Profiler, I assumed I would be able to find a library to accurately parse the HTTP User-Agent string into its components. I need this in order to categorize the test results. Was the test done with Safari or iPhone? Internet Explorer or Maxthon? NetNewsWire or OmniWeb? My search produced some candidates, but none of them had the level of accuracy I wanted, unable to properly classify edge case browsers, mobile devices, and new browsers (like Chrome and Android).

So, I rolled my own.

I find that it’s very accurate – more accurate than anything else I could find. Another good site out there is UserAgentString.com, but even they misclassify some well known browsers such as iPhone, Shiretoko, and Lunascape. When I do my daily checks I find that every 200-400 new user agents requires me to tweak my code. And I’ve written some good admin tools to do this check – it only takes 5 minutes to complete. And the code tweaks, when necessary, take less than 15 minutes.

It’s great that this helps UA Profiler, but I’d really like to share this with the web community. The first step was adding a new Parse User-Agent page to UA profiler. You can paste any User-Agent string and see how my code classifies it. I also show the results from UserAgentString.com for comparison. The next steps, if there’s interest and I can find the time, would be to make this available as a web service and make the code available, too. What do people think?

  • Do other people share this need for better User Agent parsing?
  • Do you know of something good that’s out there that I missed?
  • Do you see gaps or mistakes in UA Profiler’s parsing?

For now, I’ll keep classifying user agents as I finish the last drops of my (first) coffee in the morning.

Uncategorized

Comments (16)

Permalink

CS193H video preview

My class at Stanford, CS193H High Performance Web Sites, was videotaped. Stanford does this so that people enrolled through the Stanford Center for Professional Development, who work fulltime, can watch the class at offhours. SCPD also makes some of the class videos available to the public. I’m currently talking with SCPD about releasing my videos, but in the meantime they’ve released the video of my first class. This lecture covers the logistics of the class (syllabus, mailing list, etc.). I’ve released all the slides from the class. You can find links to the slides in the class schedule. Anyone going through the slides should watch this intro video to get a flavor for how the class was conducted.

If you would be interested in watching the videos from this class, please add a comment below. The more interest there is, the more likely SCPD will be to make the videos available.

Update: The videos are now available! Thanks for all the positive feedback. You can watch the first three lectures for free. The entire 25 lectures have a tuition of $600. The videos are offered as XCS193H on SCPD.

Evangelism

Comments (45)

Permalink