Introduction to the Special Issue on Reproducibility in Information Retrieval

Information Retrieval (IR) is a discipline that has been strongly rooted in experimentation since its inception. Experimental evaluation has always been a strong driver for IR research and innovation, and these activities have been shaped by large-scale evaluation campaigns such as Text REtrieval Conference (TREC) in the US, Conference and Labs of the Evaluation Forum (CLEF) in Europe, NII Testbeds and Community for Information access Research (NTCIR) in Japan and Asia, and Forum for Information Retrieval Evaluation (FIRE) in India. IR systems are becoming increasingly complex. They need to cross language and media barriers; they span from unstructured, via semi-structured, to highly structured data; and they are faced with diverse, complex, and frequently underspecified (ambiguously specified) information needs, search tasks, and societal challenges. As a consequence, evaluation and experimentation, which has remained a fundamental element, has in turn become increasingly sophisticated and challenging.

[1]  Norbert Fuhr,et al.  Some Common Mistakes In IR Evaluation, And How They Can Be Avoided , 2018, SIGIR Forum.

[2]  Andrew Trotman,et al.  Report on the SIGIR 2015 Workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR) , 2016, SIGF.

[3]  Nicola Ferro,et al.  Reproducibility Challenges in Information Retrieval Evaluation , 2017, ACM J. Data Inf. Qual..

[4]  Noriko Kando,et al.  Increasing Reproducibility in IR: Findings from the Dagstuhl Seminar on "Reproducibility of Data-Oriented Experiments in e-Science" , 2016, SIGIR Forum.

[5]  Craig MacDonald,et al.  Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge , 2016, ECIR.

[6]  Alistair Moffat,et al.  Has adhoc retrieval improved since 1994? , 2009, SIGIR.

[7]  Mark Sanderson,et al.  Examining Additivity and Weak Baselines , 2016, ACM Trans. Inf. Syst..

[8]  Alistair Moffat,et al.  Principles for robust evaluation infrastructure , 2011, DESIRE '11.

[9]  Alistair Moffat,et al.  Improvements that don't add up: ad-hoc retrieval results since 1998 , 2009, CIKM.

[10]  John P. A. Ioannidis,et al.  A manifesto for reproducible science , 2017, Nature Human Behaviour.

[11]  Juliana Freire,et al.  Reproducibility of Data-Oriented Experiments in e-Science (Dagstuhl Seminar 16041) , 2016, Dagstuhl Reports.

[12]  Tetsuya Sakai,et al.  Overview of CENTRE@CLEF 2018: A First Tale in the Systematic Reproducibility Realm , 2018, CLEF.

[13]  Nicola Ferro,et al.  SIGIR Initiative to Implement ACM Artifact Review and Badging , 2018, SIGF.