Benchmarking News Recommendations: The CLEF NewsREEL Use Case

The CLEF NewsREEL challenge is a campaign-style evaluation lab allowing participants to evaluate and optimize news recommender algorithms. The goal is to create an algorithm that is able to generate news items that users would click, respecting a strict time constraint. The lab challenges participants to compete in either a "living lab" (Task 1) or perform an evaluation that replays recorded streams (Task 2). In this report, we discuss the objectives and challenges of the NewsREEL lab, summarize last year's campaign and outline the main research challenges that can be addressed by participating in NewsREEL 2016.

[1]  Jimmy J. Lin,et al.  Report on the Evaluation-as-a-Service (EaaS) Expert Workshop , 2015, SIGIR Forum.

[2]  Craig MacDonald,et al.  Optimised Scheduling of Online Experiments , 2015, SIGIR.

[3]  Nicola Ferro,et al.  CLEF 15th Birthday , 2014, SIGIR Forum.

[4]  Boi Faltings,et al.  Offline and online evaluation of news recommender systems at swissinfo.ch , 2014, RecSys '14.

[5]  Frank Hopfgartner,et al.  Overview of CLEF NewsREEL 2015: News Recommendation Evaluation Lab , 2015, CLEF.

[6]  Andreas Lommatzsch,et al.  Real-Time News Recommendation Using Context-Aware Ensembles , 2014, ECIR.

[7]  Frank Hopfgartner,et al.  Semantic User Modelling for Personal News Video Retrieval , 2010, MMM.

[8]  Frank Hopfgartner,et al.  Simulated Testing of an Adaptive Multimedia Information Retrieval System , 2007, 2007 International Workshop on Content-Based Multimedia Indexing.

[9]  Michael S. Bernstein,et al.  Short and tweet: experiments on recommending content from information streams , 2010, CHI.

[10]  Frank Hopfgartner,et al.  Benchmarking News Recommendations in a Living Lab , 2014, CLEF.

[11]  Martha Larson,et al.  Stream-Based Recommendations: Online and Offline Evaluation as a Service , 2015, CLEF.

[12]  David Hawking,et al.  If SIGIR had an Academic Track, What Would Be In It? , 2015, SIGIR.

[13]  Ron Kohavi Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 Years , 2015, KDD.

[14]  Lars Schmidt-Thieme,et al.  Real-time top-n recommendation in social streams , 2012, RecSys.

[15]  Filip Radlinski,et al.  Predicting Search Satisfaction Metrics with Interleaved Comparisons , 2015, SIGIR.

[16]  Proceedings of the Workshop on Recommendation Utility Evaluation: Beyond RMSE, RUE 2012, Dublin, Ireland, September 9, 2012 , 2012, RUE@RecSys.

[17]  Frank Hopfgartner,et al.  Shedding light on a living lab: the CLEF NEWSREEL open recommendation platform , 2014, IIiX.

[18]  Krisztian Balog,et al.  Head First: Living Labs for Ad-hoc Search Evaluation , 2014, CIKM.

[19]  Frank Hopfgartner,et al.  The plista dataset , 2013, NRS '13.

[20]  Pasquale Lops,et al.  Content-based Recommender Systems: State of the Art and Trends , 2011, Recommender Systems Handbook.

[21]  Jimmy J. Lin,et al.  Evaluation as a service for information retrieval , 2013, SIGIR Forum.

[22]  Klaus-Robert Müller,et al.  Stopping conditions for exact computation of leave-one-out error in support vector machines , 2008, ICML '08.

[23]  Susan T. Dumais,et al.  Evaluation Challenges and Directions for Information-Seeking Support Systems , 2009, Computer.

[24]  Ryen W. White,et al.  Evaluating implicit feedback models using searcher simulations , 2005, TOIS.

[25]  Susan T. Dumais Evaluating IR In Situ , 2016 .