User Agents in the morning
Every working day, a script runs at 7am that opens ~20 websites in my browser. I open them at 7am so that they’re ready for me when I sit down with my coffee. I’m the performance guy – I can’t stand waiting for a page to load. Among the sites that I read everyday are blogs (Ajaxian, O’Reilly Radar, Google Reader for the rest), news sites (MarketWatch, CNET Tech News, InternetNews, TheStreet.com), and stuff for fun and life (Dilbert, Woot, The Big Picture, Netflix).
The last site is a page related to UA Profiler. It lists all the new user agents that have been tested in the last day. These are unique user agents – they’ve never been seen by UA Profiler before. When I first launched UA Profiler, there were about 50 each day. Now, it’s down to about 20 per day. But I’ve skipped over the main point.
Why do I review these new user agents every morning?
When I started UA Profiler, I assumed I would be able to find a library to accurately parse the HTTP User-Agent string into its components. I need this in order to categorize the test results. Was the test done with Safari or iPhone? Internet Explorer or Maxthon? NetNewsWire or OmniWeb? My search produced some candidates, but none of them had the level of accuracy I wanted, unable to properly classify edge case browsers, mobile devices, and new browsers (like Chrome and Android).
So, I rolled my own.
I find that it’s very accurate – more accurate than anything else I could find. Another good site out there is UserAgentString.com, but even they misclassify some well known browsers such as iPhone, Shiretoko, and Lunascape. When I do my daily checks I find that every 200-400 new user agents requires me to tweak my code. And I’ve written some good admin tools to do this check – it only takes 5 minutes to complete. And the code tweaks, when necessary, take less than 15 minutes.
It’s great that this helps UA Profiler, but I’d really like to share this with the web community. The first step was adding a new Parse User-Agent page to UA profiler. You can paste any User-Agent string and see how my code classifies it. I also show the results from UserAgentString.com for comparison. The next steps, if there’s interest and I can find the time, would be to make this available as a web service and make the code available, too. What do people think?
- Do other people share this need for better User Agent parsing?
- Do you know of something good that’s out there that I missed?
- Do you see gaps or mistakes in UA Profiler’s parsing?
For now, I’ll keep classifying user agents as I finish the last drops of my (first) coffee in the morning.