Measuring the Performance of Task Completion

Locally archived May 15, 2026 so the content survives if the original host goes offline. View original

Author: Lindsey Simon (@elsighmon) Published: December 19, 2010 Source: Web Performance Calendar, 2010 Edition

About the author

Lindsey Simon is a front-end developer for Google's User Experience group and project lead for the open source Browserscope.org project. He hails from Austin, TX, where he worked at startups, taught computing at the Griffin School, and served as webmaster for the Austin Chronicle. He currently lives in San Francisco, writes acoustic guitar songs, and helps run a foodie website, dishola.com.

Introduction

This entry takes a slightly liberal interpretation of performance and applies it to task completion and UI design. A few years ago there was a JavaScript library "speed war," largely focused on selector matching engine benchmarks. Peter Higgins's TaskSpeed suite (early 2009) was inspiring because it tested the performance of groups of common operations rather than isolated functions.

The Google Translate redesign problem

Mid-2010, my team began redesigning Google Translate, focusing on the language selection process. The existing UI required four clicks from a mouse/touchpad user to pick a language pair: open the "from" SELECT, choose a language, open the "to" SELECT, and choose again. With 51 languages (and growing), picking from a native HTML SELECT dropdown was known to be slow and frustrating. A prototype alternative picker felt better, but we wanted hard data.

Using Browserscope for a quick A/B test

While Google has infrastructure for small-percentage experiments, integrating a large UI change into the Translate frontend would take time. As project lead for Browserscope, I realized its new User Test feature — which lets developers store data and correlate medians by user agent via a JavaScript include — could be leveraged. We built a quick A/B test, hosted it at groupmenuselect.appspot.com, and shared the link with friends. Results were viewable grouped by browser on Browserscope.

What we learned

The proposed new picker (GroupMenuSelect) was faster on average.

The average time for a desktop user to choose a language from the native SELECT was 3.7 seconds while the average time for choosing from the GroupMenuSelect was 2.5 seconds.

While saving only seconds per use, frequent users (such as language learners) would see meaningful gains, and the savings multiply across all selections.

Looking at per-browser results was illuminating. Most surprising: the difference on Safari was essentially negligible. Examining screenshots of how each OS renders a native SELECT revealed a clear correlation between the number of visible options (without scrolling) and the speed of choosing the right one — supporting the new design's motivation.

Native SELECT visible option counts

NATIVE SELECT
----------------------
Total:   51 language options
Mac:     45 visible at a time
Windows: 30 visible at a time
Ubuntu:  20 visible at a time
Android:  7 visible at a time
iPhone:   5 visible at a time, with an extra click for "Done"

Broad conclusion

Once a SELECT list contains roughly 30 options or more (as in a language picker), most desktop users would benefit greatly from an alternative interface.

Mobile caveats

Speed improvements were less clear on mobile. Horizontal constraints and tiny text made it hard to read all the languages at once, undermining the new design's advantages. On Android, GroupMenuSelect completion times were sometimes higher than the native SELECT — the Android result row was incomplete because it refused to fire mousedown on the native SELECT, and misses weren't well captured by our timer. On iPhone, layout, interactions, and affordance worked well for GroupMenuSelect, yielding a slight gain over the native control (which requires considerable scrolling). Ultimately we flipped a bit in Translate to disable GroupMenuSelect on mobile before launch, pending more definitive data.

Outcome and takeaway

After the August launch, more granular and precise tracking with less selection bias confirmed the experimental findings. The story illustrates the value of experimenting and collecting data on performance, then using those signals to investigate further and refine the design.