WaterlooClarke: TREC 2015 Total Recall Track

The total recall track in TREC 2015 seeks an enhanced model to accelerate the autonomous technology-assisted review process. This paper introduces several noval ideas such as clustering based seed selection method, extended n-grams features and continuous query expansion learned from the relevant documents derived from each iteration. These methods can retrieve more relevant documents from each iteration thereby achieving high recall while requiring less review e↵ort.

[1]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[2]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[3]  Mark Sanderson,et al.  Forming test collections with no system pooling , 2004, SIGIR '04.

[4]  Maura R. Grossman,et al.  Evaluation of machine-learning protocols for technology-assisted review in electronic discovery , 2014, SIGIR.

[5]  Yi Zhang,et al.  Interactive retrieval based on faceted feedback , 2010, SIGIR '10.

[6]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[7]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[8]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[9]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[10]  David Grangier,et al.  Machine Learning for Information Retrieval , 2008 .

[11]  Maura R. Grossman,et al.  Autonomy and Reliability of Continuous Active Learning for Technology-Assisted Review , 2015, ArXiv.

[12]  Gordon V. Cormack,et al.  Machine Learning for Information Retrieval: TREC 2009 Web, Relevance Feedback and Legal Tracks , 2009, TREC.

[13]  Christopher Hogan,et al.  H5 at TREC 2008 Legal Interactive: User Modeling, Assessment & Measurement , 2008, TREC.

[14]  Michele Tarsilla Cochrane Handbook for Systematic Reviews of Interventions , 2010, Journal of MultiDisciplinary Evaluation.

[15]  Kai Yu,et al.  UCSC at Relevance Feedback Track , 2009, TREC.

[16]  Charles L. A. Clarke,et al.  Reciprocal rank fusion outperforms condorcet and individual rank learning methods , 2009, SIGIR.