Agile Information Retrieval Experimentation with Terrier Notebooks

Teaching modern information retrieval is greatly benefited by giving students hands-on experience with an open-source search engine that they can experiment with. As such, open source platforms such as Terrier are a valuable resource upon which learning exercises can be built. However, experimentation using such systems can be a laborious process when performed by hand; queries might be rewritten, executed, and model parameters tuned. Moreover, the rise of learning-to-rank as the de-facto standard for state-ofthe-art retrieval complicates this further, with the introduction of training, validation and testing (likely over multiple folded datasets representing different query types). Currently, students resort to shell scripting to make experimentation easier, however this is far from ideal. On the other hand, the introduction of experimental pipelines in platforms like scikit-learn and Apache Spark in conjunction with notebook environments such as Jupyter have been shown to markedly reduce to barriers to non-experts setting up and running experiments. In this paper, we discuss how next generation information retrieval experimental pipelines can be combined in an agile manner using notebook-style interaction mechanisms. Building upon the Terrier IR platform, we describe how this is achieved using a recently released Terrier-Spark module and other recent changes in Terrier 5.0. Overall, this paper demonstrates the advantages of the agile nature of notebooks to experimental IR environments, from the classroom environment, through academic and industry research labs.

[1]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[2]  Ronan Cummins,et al.  Learning in a pairwise term-term proximity framework for information retrieval , 2009, SIGIR.

[3]  Jimmy J. Lin,et al.  Evaluation as a service for information retrieval , 2013, SIGIR Forum.

[4]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 13: Web and Hard Tracks , 2004, TREC.

[5]  Craig MacDonald,et al.  About learning models with multiple query-dependent features , 2013, TOIS.

[6]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[7]  Jianfeng Gao,et al.  Ranking, Boosting, and Model Adaptation , 2008 .

[8]  Douglas S. Blank,et al.  Computational Notebooks for AI Education , 2015, FLAIRS.

[9]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[10]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[11]  Noriko Kando,et al.  Increasing Reproducibility in IR: Findings from the Dagstuhl Seminar on "Reproducibility of Data-Oriented Experiments in e-Science" , 2016, SIGIR Forum.

[12]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[13]  Juan M. Fernández-Luna,et al.  Lucene4IR: Developing Information Retrieval Evaluation Resources using Lucene , 2017, SIGIR Forum.

[14]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[15]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[16]  Craig MacDonald,et al.  University of Glasgow at WebCLEF 2005: Experiments in per-field Normalisation and Language Specific Stemming , 2005, CLEF.