Benchmarking News Recommendations in a Living Lab

Most user-centric studies of information access systems in literature suffer from unrealistic settings or limited numbers of users who participate in the study. In order to address this issue, the idea of a living lab has been promoted. Living labs allow us to evaluate research hypotheses using a large number of users who satisfy their information need in a real context. In this paper, we introduce a living lab on news recommendation in real time. The living lab has first been organized as News Recommendation Challenge at ACM RecSys’13 and then as campaign-style evaluation lab NEWSREEL at CLEF’14. Within this lab, researchers were asked to provide news article recommendations to millions of users in real time. Different from user studies which have been performed in a laboratory, these users are following their own agenda. Consequently, laboratory bias on their behavior can be neglected. We outline the living lab scenario and the experimental setup of the two benchmarking events. We argue that the living lab can serve as reference point for the implementation of living labs for the evaluation of information access systems.

[1]  Alan F. Smeaton,et al.  Multilingual and Multimodal Information Access Evaluation, International Conference of the Cross-Language Evaluation Forum, CLEF 2010, Padua, Italy, September 20-23, 2010. Proceedings , 2010, CLEF.

[2]  Carol Peters,et al.  Report on the SIGIR 2009 workshop on the future of IR evaluation , 2009, SIGF.

[3]  Frank Hopfgartner,et al.  The plista dataset , 2013, NRS '13.

[4]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[5]  Frank Hopfgartner,et al.  Use of Implicit Graph for Recommending Relevant Videos: A Simulated Evaluation , 2008, ECIR.

[6]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[7]  Xavier Amatriain,et al.  Mining large streams of user data for personalized recommendations , 2013, SKDD.

[8]  Andreas Lommatzsch,et al.  Real-Time News Recommendation Using Context-Aware Ensembles , 2014, ECIR.

[9]  Frank Hopfgartner,et al.  Users' reading habits in online news portals , 2014, IIiX.

[10]  Gediminas Adomavicius,et al.  Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques , 2012, IEEE Transactions on Knowledge and Data Engineering.

[11]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[12]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[13]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[14]  Mark D. Smucker,et al.  Report on the CIKM workshop on living labs for information retrieval evaluation , 2014, SIGF.

[15]  Alfred Kobsa,et al.  The Adaptive Web, Methods and Strategies of Web Personalization , 2007, The Adaptive Web.

[16]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[17]  Mark Sanderson,et al.  Evaluating the performance of information retrieval systems using test collections , 2013, Inf. Res..

[18]  Cyril W. Cleverdon,et al.  Aslib Cranfield research project - Factors determining the performance of indexing systems; Volume 1, Design; Part 2, Appendices , 1966 .

[19]  Frank Hopfgartner,et al.  Shedding light on a living lab: the CLEF NEWSREEL open recommendation platform , 2014, IIiX.

[20]  Michael J. Pazzani,et al.  Content-Based Recommendation Systems , 2007, The Adaptive Web.

[21]  Nicholas J. Belkin,et al.  The TREC Interactive Tracks: Putting the User into Search , 2005 .

[22]  Krisztian Balog,et al.  Towards a Living Lab for Information Retrieval Research and Development - A Proposal for a Living Lab for Product Search Tasks , 2011, CLEF.

[23]  Frank Hopfgartner,et al.  Semantic user profiling techniques for personalised multimedia recommendation , 2010, Multimedia Systems.

[24]  Susan T. Dumais,et al.  Evaluation Challenges and Directions for Information-Seeking Support Systems , 2009, Computer.

[25]  Kevin C. Almeroth,et al.  Workshop and challenge on news recommender systems , 2013, RecSys.

[26]  Guy Shani,et al.  Evaluating Recommendation Systems , 2011, Recommender Systems Handbook.

[27]  James Bennett,et al.  The Netflix Prize , 2007 .

[28]  Nicholas J. Belkin,et al.  Some(what) grand challenges for information retrieval , 2008, SIGF.

[29]  Yehuda Koren,et al.  The Yahoo! Music Dataset and KDD-Cup '11 , 2012, KDD Cup.

[30]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[31]  Krisztian Balog,et al.  Towards a living lab for information retrieval research and development: a proposal for a living lab for product search tasks , 2011 .

[32]  John Riedl,et al.  Recommender systems: from algorithms to user experience , 2012, User Modeling and User-Adapted Interaction.

[33]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[34]  Marti A. Hearst,et al.  The state of the art in automating usability evaluation of user interfaces , 2001, CSUR.

[35]  Barry Smyth,et al.  Using twitter to recommend real-time topical news , 2009, RecSys '09.

[36]  James Allan,et al.  HARD Track Overview in TREC 2003: High Accuracy Retrieval from Documents , 2003, TREC.

[37]  Peter Pirolli Powers of 10: Modeling Complex Information-Seeking Systems at Multiple Scales , 2009, Computer.

[38]  Jimmy J. Lin,et al.  A month in the life of a production news recommender system , 2013, LivingLab '13.