stats = memcache.get_multi(browsers, **memcache_params)
for browser in browsers:
if browser not in stats:
# If we didn't find this key in memcache do some datastore call
medians, num_scores = test_set.GetMediansAndNumScores(browser)
# Update memcache with any new key/value pairs in the data structure
memcache.set_multi(stats, **memcache_params)
The problem is that we were encountering DeadlineExceeded errors walking through the list of browsers when getting large result sets back from datastore calls to GetMediansAndNumScores(). Worse than a DeadlineExceeded error is doing nothing to prevent another one. Conveniently you can catch this in a try/except block on App Engine and then do something smart - store the state of things in memcache so that the next call doesn't start from scratch on a mission it will never complete. The updated code looks like this:
from google.appengine.runtime import DeadlineExceededError
dirty = False
stats = memcache.get_multi(browsers, **memcache_params)
try:
for browser in browsers:
if browser not in stats:
dirty = True
medians, num_scores = test_set.GetMediansAndNumScores(browser)
except DeadlineExceededError:
# Try to get what we've got so far at least in memcache.
memcache.set_multi(stats, **memcache_params)
logging.info('Whew, made it.')
if dirty:
memcache.set_multi(stats, **memcache_params)
This is just a cool pattern of defensive programming, and thankfully it works because the DeadlineExceededError allows for a short time secondary deadline where you get a little time to do a little work (like write something to memcache, as opposed to the datastore).
The end result of this is that test result tables in Browserscope should be delivering more consistently and more quickly - and if we respond with a 500 once, we at least might not have to on the next request. One thing I'm wondering about is adding a redirect to this process to try to run the process again - it will eventually succeed.