Evaluating collaborative filtering recommender systems

Recommender systems have been evaluated in many, often incomparable, ways. In this article, we review the key decisions in evaluating collaborative filtering recommender systems: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole. In addition to reviewing the evaluation strategies used by prior researchers, we present empirical results from the analysis of various accuracy metrics on one content domain where all the tested metrics collapsed roughly into three equivalence classes. Metrics within each equivalency class were strongly correlated, while metrics from different equivalency classes were uncorrelated.

[1]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[2]  Michael Keen,et al.  ASLIB CRANFIELD RESEARCH PROJECT FACTORS DETERMINING THE PERFORMANCE OF INDEXING SYSTEMS VOLUME 2 , 1966 .

[3]  John A. Swets,et al.  Effectiveness of information retrieval methods , 1969 .

[4]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[5]  Dennis E. Egan,et al.  Handbook of Human Computer Interaction , 1988 .

[6]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[7]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[8]  Yoichi Shinoda,et al.  Information filtering based on user behavior analysis and best match text retrieval , 1994, SIGIR '94.

[9]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[10]  Donna K. Harman,et al.  The TREC Conferences , 1997, HIM.

[11]  C. Le,et al.  Construction and Comparison of Two Receiver Operating Characteristic Curves Derived from the Same Samples , 1995 .

[12]  Yiyu Yao Measuring retrieval effectiveness based on user preference of documents , 1995 .

[13]  Mark Rosenstein,et al.  Recommending and evaluating choices in a virtual community of use , 1995, CHI '95.

[14]  Yiyu Yao,et al.  Measuring Retrieval Effectiveness Based on User Preference of Documents , 1995, J. Am. Soc. Inf. Sci..

[15]  Stephen P. Harter,et al.  Variations in Relevance Assessments and the Measurement of Retrieval Effectiveness , 1996, J. Am. Soc. Inf. Sci..

[16]  Gerald J. Kowalski,et al.  Information Retrieval Systems , 1997, The Information Retrieval Series.

[17]  William M. Newman,et al.  Better or just different? On the benefits of designing interactive systems in terms of critical parameters , 1997, DIS '97.

[18]  Bradley N. Miller,et al.  GroupLens: applying collaborative filtering to Usenet news , 1997, CACM.

[19]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[20]  Bradley N. Miller,et al.  Experiences with GroupLens: marking usenet useful again , 1997 .

[21]  Jakob Nielsen,et al.  Usability engineering , 1997, The Computer Science and Engineering Handbook.

[22]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[23]  Bradley N. Miller,et al.  Using filtering agents to improve prediction quality in the GroupLens research collaborative filtering system , 1998, CSCW '98.

[24]  Ellen M. Voorhees,et al.  Overview of the Seventh Text REtrieval Conference , 1998 .

[25]  William W. Cohen,et al.  Recommendation as Classification: Using Social and Content-Based Information in Recommendation , 1998, AAAI/IAAI.

[26]  Michael J. Pazzani,et al.  Learning Collaborative Information Filters , 1998, ICML.

[27]  John Riedl,et al.  Combining Collaborative Filtering with Personal Agents for Better Recommendations , 1999, AAAI/IAAI.

[28]  Philip S. Yu,et al.  Horting hatches an egg: a new graph-theoretic approach to collaborative filtering , 1999, KDD '99.

[29]  Deborah Hix,et al.  An empirical evaluation of user interfaces for topic management of Web sites , 1999, CHI '99.

[30]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[31]  Ellen M. Voorhees,et al.  Overview of the seventh text retrieval conference (trec-7) [on-line] , 1999 .

[32]  Ellen M. Voorhees,et al.  The seventh text REtrieval conference (TREC-7) , 1999 .

[33]  Pattie Maes,et al.  Footprints: history-rich tools for information foraging , 1999, CHI '99.

[34]  Frank Linton,et al.  OWL: A Recommender System for Organization-Wide Learning , 2000, J. Educ. Technol. Soc..

[35]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[36]  John Riedl,et al.  Analysis of recommendation algorithms for e-commerce , 2000, EC '00.

[37]  Eric Horvitz,et al.  Collaborative Filtering by Personality Diagnosis: A Hybrid Memory and Model-Based Approach , 2000, UAI.

[38]  John Riedl,et al.  Explaining collaborative filtering recommendations , 2000, CSCW '00.

[39]  Loren G. Terveen,et al.  Let's Stop Pushing the Envelope and Start Addressing It: A Reference Task Agenda for HCI , 2000, Hum. Comput. Interact..

[40]  John Riedl,et al.  Application of Dimensionality Reduction in Recommender System - A Case Study , 2000 .

[41]  Tao Luo,et al.  Effective personalization based on association rule discovery from web usage data , 2001, WIDM '01.

[42]  David W. McDonald,et al.  Evaluating expertise recommendations , 2001, GROUP.

[43]  Laura J. Gurak,et al.  An Examination of Trust Production in Computer-Mediated Exchange , 2001 .

[44]  Peter Szolovits,et al.  Collaborative sanctioning: applications in restaurant recommendations based on reputation , 2001, AGENTS '01.

[45]  Kirsten Swearingen,et al.  Beyond Algorithms: An HCI Perspective on Recommender Systems , 2001 .

[46]  M. Claypool,et al.  Inferring User Interest , 2001, IEEE Internet Comput..

[47]  David M. Pennock,et al.  Generative Models for Cold-Start Recommendations , 2001 .

[48]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[49]  Andrew Turpin,et al.  Why batch and user evaluations do not give the same results , 2001, SIGIR '01.

[50]  Stuart C. Rogers Marketing Strategies, Tactics, and Techniques: A Handbook for Practitioners , 2001 .

[51]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[52]  David M. Pennock,et al.  Methods and metrics for cold-start recommendations , 2002, SIGIR '02.

[53]  Sean M. McNee,et al.  On the recommending of citations for research papers , 2002, CSCW '02.

[54]  John Riedl,et al.  Meta-recommendation systems: user-controlled integration of diverse recommendations , 2002, CIKM '02.

[55]  John F. Canny,et al.  Collaborative filtering with privacy via factor analysis , 2002, SIGIR '02.

[56]  Masaru Kitsuregawa,et al.  A Graph Based Approach to Extract a Neighborhood Customer Community for Collaborative Filtering , 2002, DNIS.

[57]  Rashmi R. Sinha,et al.  The role of transparency in recommender systems , 2002, CHI Extended Abstracts.

[58]  Sean M. McNee,et al.  Getting to know you: learning new user preferences in recommender systems , 2002, IUI '02.

[59]  Deborah Hix,et al.  Experiments in social data mining: The TopicShop system , 2003, TCHI.

[60]  Bradley N. Miller,et al.  MovieLens unplugged: experiences with an occasionally connected recommender system , 2003, IUI '03.

[61]  John Riedl,et al.  Is seeing believing?: how recommender system interfaces affect users' opinions , 2003, CHI '03.

[62]  Bradley N. Miller,et al.  MovieLens Unplugged: Experiences with a Recommender System on Four Mobile Devices , 2004 .

[63]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[64]  John Riedl,et al.  An Empirical Analysis of Design Choices in Neighborhood-Based Collaborative Filtering Algorithms , 2002, Information Retrieval.