Benchmarking - A Methodology for Ensuring the Relative Quality of Recommendation Systems in Software Engineering

This chapter describes the concepts involved in the process of benchmarking of recommendation systems. Benchmarking of recommendation systems is used to ensure the quality of a research system or production system in comparison to other systems, whether algorithmically, infrastructurally, or according to any sought-after quality. Specifically, the chapter presents evaluation of recommendation systems according to recommendation accuracy, technical constraints, and business values in the context of a multi-dimensional benchmarking and evaluation model encompassing any number of qualities into a final comparable metric. The focus is put on quality measures related to recommendation accuracy, technical factors, and business values. The chapter first introduces concepts related to evaluation and benchmarking of recommendation systems, continues with an overview of the current state of the art, then presents the multi-dimensional approach in detail. The chapter concludes with a brief discussion of the introduced concepts and as ummary.

[1]  David M. Pennock,et al.  CROC: A New Evaluation Criterion for Recommender Systems , 2005, Electron. Commer. Res..

[2]  Xavier Gandibleux,et al.  Multiple Criteria Optimization: State of the Art Annotated Bibliographic Surveys , 2013 .

[3]  Guy Shani,et al.  A Survey of Accuracy Evaluation Metrics of Recommendation Tasks , 2009, J. Mach. Learn. Res..

[4]  George Karypis,et al.  A Comprehensive Survey of Neighborhood-based Recommendation Methods , 2011, Recommender Systems Handbook.

[5]  Gordon Fraser,et al.  Sound empirical evidence in software testing , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[6]  Lars Grunske,et al.  Dimensions and Metrics for Evaluating Recommendation Systems , 2014, Recommendation Systems in Software Engineering.

[7]  Sean M. McNee,et al.  Confidence Displays and Training in Recommender Systems , 2003, INTERACT.

[8]  Jun Wang,et al.  Optimizing multiple objectives in collaborative filtering , 2010, RecSys '10.

[9]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[10]  Loren Terveen,et al.  Beyond Recommender Systems: Helping People Help Each Other , 2001 .

[11]  R. Boxwell Benchmarking for competitive advantage , 1994 .

[12]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[13]  Sushil Krishna Bajracharya,et al.  Analyzing and mining a code search engine usage log , 2010, Empirical Software Engineering.

[14]  Peter Vojtás,et al.  Evaluating the Importance of Various Implicit Factors in E-commerce , 2012, RUE@RecSys.

[15]  Vyas Sekar,et al.  Understanding website complexity: measurements, metrics, and implications , 2011, IMC '11.

[16]  Jung-Yu Lai,et al.  Assessment of employees' perceptions of service quality and satisfaction with e-business , 2006, SIGMIS CPR '06.

[17]  Martha Larson,et al.  Recommender Systems Evaluation: A 3D Benchmark , 2012, RUE@RecSys.

[18]  Paulo J. G. Lisboa,et al.  The value of personalised recommender systems to e-business: a case study , 2008, RecSys '08.

[19]  Gary B. Lamont,et al.  Multiobjective Evolutionary Algorithms: Analyzing the State-of-the-Art , 2000, Evolutionary Computation.

[20]  Fiona Fui-Hoon Nah,et al.  A study on tolerable waiting time: how long are Web users willing to wait? , 2004, AMCIS.

[21]  Robert J Boxwell,et al.  Boxwell, Robert J, Jr, Benchmarking for Competitive Advantage, New York, McGraw-Hill , 1994 .

[22]  Franca Garzotto,et al.  Looking for "Good" Recommendations: A Comparative Evaluation of Recommender Systems , 2011, INTERACT.

[23]  Jiahui Liu,et al.  Personalized news recommendation based on click behavior , 2010, IUI '10.

[24]  Franca Garzotto,et al.  Investigating the Persuasion Potential of Recommender Systems from a Quality Perspective: An Empirical Study , 2012, TIIS.

[25]  Mimi Recker,et al.  Understanding educator perceptions of "quality" in digital libraries , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[26]  Alton Yeow-Kuan Chua,et al.  Investigating user perceptions of engagement and information quality in mobile human computation games , 2012, JCDL '12.

[27]  Domonkos Tikk,et al.  Recommending new movies: even a few ratings are more valuable than metadata , 2009, RecSys '09.

[28]  Sahin Albayrak,et al.  User-centric evaluation of a K-furthest neighbor collaborative filtering recommender algorithm , 2013, CSCW.

[29]  Paul Lamere,et al.  If you like the beatles you might like...: a tutorial on music recommendation , 2008, ACM Multimedia.

[30]  Yehuda Koren,et al.  Yahoo! music recommendations: modeling music ratings with temporal dynamics and item taxonomy , 2011, RecSys '11.

[31]  Li Chen,et al.  A cross-cultural user evaluation of product recommender interfaces , 2008, RecSys '08.

[32]  Li Chen,et al.  A user-centric evaluation framework for recommender systems , 2011, RecSys '11.

[33]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[34]  Gediminas Adomavicius,et al.  Stability of Recommendation Algorithms , 2012, TOIS.

[35]  Albert N. Badre,et al.  Culturability: the merging of culture and usability , 1998 .

[36]  Lothar Thiele,et al.  Comparison of Multiobjective Evolutionary Algorithms: Empirical Results , 2000, Evolutionary Computation.

[37]  Hang Li,et al.  Do clicks measure recommendation relevancy?: an empirical user study , 2010, RecSys '10.

[38]  Hinrich Schütze,et al.  Projections for efficient document clustering , 1997, SIGIR '97.

[39]  Carlos Gomez-Uribe Netflix Challenges and Limitations in the Offline and Online Evaluation of Recommender Systems : A Netflix Case Study , 2012 .

[40]  Rong Hu,et al.  Design and user issues in personality-based recommender systems , 2010, RecSys '10.

[41]  Robert J. Walker,et al.  Recent Advances in Recommendation Systems for Software Engineering , 2013, IEA/AIE.

[42]  Ahmed Eldawy,et al.  Sindbad: a location-based social networking system , 2012, SIGMOD Conference.

[43]  Giuseppe Carenini,et al.  User-Specific Decision-Theoretic Accuracy Metrics for Collaborative Filtering , 2004 .

[44]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[45]  Domonkos Tikk,et al.  Scalable Collaborative Filtering Approaches for Large Recommender Systems , 2009, J. Mach. Learn. Res..