STELLA: Towards a Framework for the Reproducibility of Online Search Experiments

Reproducibility is a central aspect of offline as well as online evaluations, to validate the results of different teams and in different experimental setups. However, often it is difficult or not even possible to reproduce an online evaluation, as solely a few data providers give access to their system, and if they do, it is limited in time and typically only during an official challenge. To alleviate the situation, we propose STELLA: a living lab infrastructure with consistent access to a data provider’s system, which can be used to train and evaluate searchand recommender algorithms. In this position paper, we align STELLA’s architecture to the PRIMAD model and its six different components specifying reproducibility in online evaluations and illustrate two use cases with two academic search systems.