The MovieLens Datasets: History and Context

The MovieLens datasets are widely used in education, research, and industry. They are downloaded hundreds of thousands of times each year, reflecting their use in popular press programming books, traditional and online courses, and software. These datasets are a product of member activity in the MovieLens movie recommendation system, an active research platform that has hosted many experiments since its launch in 1997. This article documents the history of MovieLens and the MovieLens datasets. We include a discussion of lessons learned from running a long-standing, live research platform from the perspective of a research organization. We document best practices and limitations of using the MovieLens datasets in new research.

[1]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[2]  Bradley N. Miller,et al.  GroupLens: applying collaborative filtering to Usenet news , 1997, CACM.

[3]  John Riedl,et al.  Application of Dimensionality Reduction in Recommender System - A Case Study , 2000 .

[4]  John Riedl,et al.  PolyLens: A recommender system for groups of user , 2001, ECSCW.

[5]  George Karypis,et al.  Evaluation of Item-Based Top-N Recommendation Algorithms , 2001, CIKM '01.

[6]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[7]  David M. Pennock,et al.  Methods and metrics for cold-start recommendations , 2002, SIGIR '02.

[8]  Sean M. McNee,et al.  Getting to know you: learning new user preferences in recommender systems , 2002, IUI '02.

[9]  Bradley N. Miller,et al.  Toward a personal recommender system , 2003 .

[10]  John Riedl,et al.  Is seeing believing?: how recommender system interfaces affect users' opinions , 2003, CHI '03.

[11]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[12]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[13]  John Riedl,et al.  How oversight improves member-maintained communities , 2005, CHI.

[14]  Barry Smyth,et al.  Trust in recommender systems , 2005, IUI.

[15]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[16]  John Riedl,et al.  tagging, communities, vocabulary, evolution , 2006, CSCW '06.

[17]  John Riedl,et al.  Insert movie reference here: a system to bridge conversation and item-oriented web sites , 2006, CHI.

[18]  Robert E. Kraut,et al.  Talk amongst yourselves: inviting users to participate in online conversations , 2007, IUI '07.

[19]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[20]  Paolo Avesani,et al.  Trust-aware recommender systems , 2007, RecSys '07.

[21]  John Riedl,et al.  The quest for quality tags , 2007, GROUP.

[22]  Dan Frankowski,et al.  Supporting social recommendations with activity-balanced clustering , 2007, RecSys '07.

[23]  D. Prelec,et al.  Contrast Effects in Consumer Judgments : Changes in Mental Representations or in the Anchoring of Rating Scales ? , 2007 .

[24]  John Riedl,et al.  Learning preferences of new users in recommender systems: an information theoretic approach , 2008, SKDD.

[25]  John Riedl,et al.  Learning to recognize valuable tags , 2009, IUI.

[26]  John Riedl,et al.  Tag expression: tagging with feeling , 2010, UIST.

[27]  Mikhil Masli,et al.  Eliciting and focusing geographic volunteer work , 2010, CSCW '10.

[28]  John Riedl,et al.  Rethinking the recommender research ecosystem: reproducibility, openness, and LensKit , 2011, RecSys '11.

[29]  Guy Shani,et al.  Evaluating Recommendation Systems , 2011, Recommender Systems Handbook.

[30]  John Riedl,et al.  The Tag Genome: Encoding Community Knowledge to Support Novel Interaction , 2012, TIIS.

[31]  Yehuda Koren,et al.  The Yahoo! Music Dataset and KDD-Cup '11 , 2012, KDD Cup.

[32]  Robert E. Kraut,et al.  Building Member Attachment in Online Communities: Applying Theories of Group Identity and Interpersonal Bonds , 2012, MIS Q..

[33]  F. Maxwell Harper,et al.  Letting Users Choose Recommender Algorithms: An Experimental Study , 2015, RecSys.

[34]  Nick Pentreath,et al.  Machine Learning with Spark , 2015 .

[35]  F. M. Harper,et al.  Using Groups of Items for Preference Elicitation in Recommender Systems , 2015, CSCW 2015.

[36]  Harmanpreet Kaur,et al.  Putting Users in Control of their Recommendations , 2015, RecSys.

[37]  Jure Leskovec,et al.  Inferring Networks of Substitutable and Complementary Products , 2015, KDD.

[38]  Joseph A. Konstan,et al.  Teaching recommender systems at large scale: evaluation and lessons learned from a hybrid MOOC , 2014, L@S.

[39]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[40]  Brijesh Singh,et al.  The Lean Startup:How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses , 2016 .