论文信息 - Random Performance Differences Between Online Recommender System Algorithms

Random Performance Differences Between Online Recommender System Algorithms

In the evaluation of recommender systems, the quality of recommendations made by a newly proposed algorithm is compared to the state-of-the-art, using a given quality measure and dataset. Validity of the evaluation depends on the assumption that the evaluation does not exhibit artefacts resulting from the process of collecting the dataset. The main difference between online and offline evaluation is that in the online setting, the user’s response to a recommendation is only observed once. We used the NewsREEL challenge to gain a deeper understanding of the implications of this difference for making comparisons between different recommender systems. The experiments aim to quantify the expected degree of variation in performance that cannot be attributed to differences between systems. We classify and discuss the non-algorithmic causes of performance differences observed.

Arjen P. de Vries | Gebrekirstos G. Gebremeskel

[1] Jöran Beel,et al. A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation , 2013, RepSys '13.

[2] Sean M. McNee,et al. Don't look stupid: avoiding pitfalls when recommending research papers , 2006, CSCW '06.

[3] Boi Faltings,et al. Offline and online evaluation of news recommender systems at swissinfo.ch , 2014, RecSys '14.

[4] George Forman,et al. A Live Comparison of Methods for Personalized Article Recommendation at Forbes.com , 2012, ECML/PKDD.

[5] Frank Hopfgartner,et al. Shedding light on a living lab: the CLEF NEWSREEL open recommendation platform , 2014, IIiX.

[6] Frank Hopfgartner,et al. Benchmarking News Recommendations in a Living Lab , 2014, CLEF.