An Exploration of Total Recall with Multiple Manual Seedings

Four different reviewers participated in every topic. For each topic, a reviewer was assigned to manually seed that topic either by doing a single query (one-shot) and flagging (reviewing) the first (i.e. not necessarily the best) 25 documents returned by the query, or run as many queries (interactive) as desired within a short time period, but stop after 25 documents had been reviewed. In the one-shot approach, the reviewer is not allowed to examine any documents before issuing the query, i.e. the single query is issued ”blind” after only reading the topic title and description. The first 25 documents returned by that query are flagged as having been reviewed, but because there is no further interaction with the system, it does not matter whether or not the reviewer spends any time looking at those documents. In the interactive case, reviewers were free to read documents in as much or little depth as they wished, issue as many or as few queries as they wished, and use whatever affordances were available to them from the system to find documents (e.g. synonym expansion, timeline views, communication tracking views, etc.) Every document that the reviewer laid eyeballs on during this interactive period had to be flagged as having been seen and submitted to the Total Recall server, whether or not the reviewer believed the document to be relevant. This was of course done in order to correctly assess total effort, and therefore correctly measure gain (recall as a function of effort). We also note that the software did not strictly enforce the 25 document guideline. As a result, sometimes the interactive reviewers went a few documents over their 25 document limit and sometimes they went a few documents under, as per natural human variance and mistake, but we do not consider this to be significant. Regardless, all documents reviewed, even with duplication, were noted and sent to the Total Recall server. The reviewers working on each topic were randomized, assigned to run each topic either in one-shot or in interactive mode. Each topic had two one-shot and two interactive 25-document starting points. For our one allowed official manual run, these starting points were combined (unioned) into a separate starting point. Because we did not control for overlap or duplication of effort, the union of these reviewed documents is often smaller than the sum. Reviewers working asynchronously and without knowledge of each other often found (flagged as seen) the same exact documents. In this paper, we augment the official run with a number of unofficial runs, four for each topic, two one-shot starting points and two interactive starting points. This will be discussed further in Section 3