论文信息 - STELLA: Towards a Framework for the Reproducibility of Online Search Experiments

STELLA: Towards a Framework for the Reproducibility of Online Search Experiments

Reproducibility is a central aspect of offline as well as online evaluations, to validate the results of different teams and in different experimental setups. However, often it is difficult or not even possible to reproduce an online evaluation, as solely a few data providers give access to their system, and if they do, it is limited in time and typically only during an official challenge. To alleviate the situation, we propose STELLA: a living lab infrastructure with consistent access to a data provider’s system, which can be used to train and evaluate searchand recommender algorithms. In this position paper, we align STELLA’s architecture to the PRIMAD model and its six different components specifying reproducibility in online evaluations and illustrate two use cases with two academic search systems.

[1] Bernd Müller,et al. LIVIVO – the Vertical Search Engine for Life Sciences , 2017, Datenbank-Spektrum.

[2] Noriko Kando,et al. Increasing Reproducibility in IR: Findings from the Dagstuhl Seminar on "Reproducibility of Data-Oriented Experiments in e-Science" , 2016, SIGIR Forum.

[3] Jimmy J. Lin,et al. Evaluation-as-a-Service: Overview and Outlook , 2015, ArXiv.

[4] Martha Larson,et al. CLEF 2017 NewsREEL Overview: Offline and Online Evaluation of Stream-based News Recommender Systems , 2017, CLEF.

[5] Maarten de Rijke,et al. OpenSearch: Lessons Learned from an Online Evaluation Campaign , 2018, ACM J. Data Inf. Qual..

[6] Krisztian Balog,et al. Overview of the TREC 2016 Open Search track Academic Search Edition , 2016 .

[7] Katarina Boland,et al. A Digital Library for Research Data and Related Information in the Social Sciences , 2019, 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL).