Evaluation as a service for information retrieval

How can we run large-scale, community-wide evaluations of information retrieval systems if we lack the ability to distribute the document collection on which the task is based? This was the challenge we faced in the TREC Microblog tracks over the past few years. In this paper, we present a novel evaluation methodology we dub "evaluation as a service", which was implemented at TREC 2013 to address restrictions on data redistribution. The basic idea is that instead of distributing the document collection, we (the track organizers) provided a service API "in the cloud" with which participants could accomplish the evaluation task. We outline advantages as well as disadvantages of this evaluation methodology, and discuss how the approach might be extended to other evaluation scenarios.