From Puppy to Maturity: Experiences in Developing Terrier

The Terrier information retrieval (IR) platform, maintained by the University of Glasgow, has been open sourced since 2004. Open source IR platforms are vital to the research community, as they provide state-of-the-art baselines and structures, thereby alleviating the need to ‘reinvent the wheel’. Moreover, the open source nature of Terrier is critical, since it enables researchers to build their own unique research on top of it rather than treating it as a black box. In this position paper, we describe our experiences in developing the Terrier platform for the community. Furthermore, we discuss the vision for Terrier over the next few years and provide a roadmap for the future.

[1]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Craig MacDonald,et al.  MapReduce indexing strategies: Studying scalability and efficiency , 2012, Inf. Process. Manag..

[4]  Craig MacDonald,et al.  SMART: An Open Source Framework for Searching the Physical World , 2012, OSIR@SIGIR.

[5]  Craig MacDonald,et al.  Evaluating Real-Time Search over Tweets , 2012, ICWSM.

[6]  Harry Shum,et al.  Query Dependent Ranking Using K-nearest Neighbor * , 2022 .

[7]  Craig MacDonald,et al.  Selectively diversifying web search results , 2010, CIKM.

[8]  Craig MacDonald,et al.  Upper-bound approximations for dynamic pruning , 2011, TOIS.

[9]  Craig MacDonald,et al.  Identifying top news using crowdsourcing , 2012, Information Retrieval.

[10]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 13: Web and Hard Tracks , 2004, TREC.

[11]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[12]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[13]  Craig MacDonald,et al.  CrowdTerrier: automatic crowdsourced relevance assessments with terrier , 2012, SIGIR '12.

[14]  Andrei Z. Broder,et al.  Efficient query evaluation using a two-level retrieval process , 2003, CIKM '03.

[15]  Justin Zobel,et al.  Efficient single-pass index construction for text databases , 2003, J. Assoc. Inf. Sci. Technol..

[16]  OunisIadh,et al.  A case study of distributed information retrieval architectures to index one terabyte of text , 2005 .

[17]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[18]  Gianni Amati,et al.  Probability models for information retrieval based on divergence from randomness , 2003 .

[19]  Iadh Ounis,et al.  A case study of distributed information retrieval architectures to index one terabyte of text , 2005, Inf. Process. Manag..

[20]  Iadh Ounis,et al.  A Query-based Pre-retrieval Model Selection Approach to Information Retrieval , 2004, RIAO.

[21]  Alistair Moffat,et al.  Pruned query evaluation using pre-computed impacts , 2006, SIGIR.

[22]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[23]  Craig MacDonald,et al.  University of Glasgow at WebCLEF 2005: Experiments in per-field Normalisation and Language Specific Stemming , 2005, CLEF.

[24]  Craig MacDonald,et al.  University of Glasgow at the NTCIR-9 Intent task: Experiments with Terrier on Subtopic Mining and Document Ranking , 2011, NTCIR.