A Survey of Collaborative Filtering Techniques

As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce CF tasks and their main challenges, such as data sparsity, scalability, synonymy, gray sheep, shilling attacks, privacy protection, etc., and their possible solutions. We then present three main categories of CF techniques: memory-based, modelbased, and hybrid CF algorithms (that combine CF with other recommendation techniques), with examples for representative algorithms of each category, and analysis of their predictive performance and their ability to address the challenges. From basic techniques to the state-of-the-art, we attempt to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area.

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[3]  R. Bellman Dynamic programming. , 1957, Science.

[4]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[5]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[6]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[9]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  R. Little Missing-Data Adjustments in Large Surveys , 1988 .

[11]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[12]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[13]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[14]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[15]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[16]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[17]  Yiyu Yao,et al.  Measuring Retrieval Effectiveness Based on User Preference of Documents , 1995, J. Am. Soc. Inf. Sci..

[18]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[19]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[20]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[21]  Yoav Shoham,et al.  Content-Based, Collaborative Recommendation. , 1997 .

[22]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[23]  Bruce Krulwich,et al.  LIFESTYLE FINDER: Intelligent User Profiling Using Large-Scale Demographic Data , 1997, AI Mag..

[24]  Milos Hauskrecht,et al.  Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes , 1997, AAAI/IAAI.

[25]  Rama Chellappa,et al.  An electronic infrastructure for a virtual university , 1997, CACM.

[26]  Robert H. Guttman Merchant differentiation through integrative negotiation in agent-mediated electronic commerce , 1998 .

[27]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[28]  Bradley N. Miller,et al.  Using filtering agents to improve prediction quality in the GroupLens research collaborative filtering system , 1998, CSCW '98.

[29]  Naoki Abe,et al.  Collaborative Filtering Using Weighted Majority Prediction Algorithms , 1998, ICML.

[30]  William W. Cohen,et al.  Recommendation as Classification: Using Social and Content-Based Information in Recommendation , 1998, AAAI/IAAI.

[31]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[32]  Dean P. Foster,et al.  Clustering Methods for Collaborative Filtering , 1998, AAAI 1998.

[33]  Michael J. Pazzani,et al.  Learning Collaborative Information Filters , 1998, ICML.

[34]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[35]  Philip S. Yu,et al.  Horting hatches an egg: a new graph-theoretic approach to collaborative filtering , 1999, KDD '99.

[36]  Mark Claypool,et al.  Combining Content-Based and Collaborative Filters in an Online Newspaper , 1999, SIGIR 1999.

[37]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[38]  Jonathan L. Herlocker,et al.  Clustering items for collaborative filtering , 1999 .

[39]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[40]  Christian Posse,et al.  Bayesian Mixed-Effects Models for Recommender Systems , 1999 .

[41]  Thomas Hofmann,et al.  Latent Class Models for Collaborative Filtering , 1999, IJCAI.

[42]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[43]  Loriene Roy,et al.  Content-based book recommending using learning for text categorization , 1999, DL '00.

[44]  Michael J. Pazzani,et al.  Collaborative Filtering with the Simple Bayesian Classifier , 2000, PRICAI.

[45]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[46]  John Riedl,et al.  Analysis of recommendation algorithms for e-commerce , 2000, EC '00.

[47]  Barry Smyth,et al.  A personalised TV listings service for the digital TV age , 2000, Knowl. Based Syst..

[48]  Eric Horvitz,et al.  Collaborative Filtering by Personality Diagnosis: A Hybrid Memory and Model-Based Approach , 2000, UAI.

[49]  R. Kohli,et al.  Internet Recommendation Systems , 2000 .

[50]  øöö Blockinøø Well-Trained PETs : Improving Probability Estimation , 2000 .

[51]  Kristian J. Hammond,et al.  Mining navigation history for recommendation , 2000, IUI '00.

[52]  Joaquin Arturo Delgado Agent-besed information filtering and recommender systems on the internet , 2000 .

[53]  Sally I. McClean,et al.  Knowledge discovery in distributed databases using evidence theory , 2000, Int. J. Intell. Syst..

[54]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[55]  Ke Wang,et al.  RecTree: An Efficient Collaborative Filtering Method , 2001, DaWaK.

[56]  George Karypis,et al.  Evaluation of Item-Based Top-N Recommendation Algorithms , 2001, CIKM '01.

[57]  Rashmi R. Sinha,et al.  Comparing Recommendations Made by Online Systems and Friends , 2001, DELOS.

[58]  David M. Pennock,et al.  Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments , 2001, UAI.

[59]  HofmannThomas Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2001 .

[60]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[61]  David M. Pennock,et al.  Methods and metrics for cold-start recommendations , 2002, SIGIR '02.

[62]  John Riedl,et al.  Recommender Systems for Large-scale E-Commerce : Scalable Neighborhood Formation Using Clustering , 2002 .

[63]  John Riedl,et al.  Incremental SVD-Based Algorithms for Highly Scaleable Recommender Systems , 2002 .

[64]  Michael J. Pazzani,et al.  Improvement of Collaborative Filtering with the Simple Bayesian Classifier 1 , 2002 .

[65]  Raymond J. Mooney,et al.  Content-boosted collaborative filtering for improved recommendations , 2002, AAAI/IAAI.

[66]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[67]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[68]  Hans-Peter Kriegel,et al.  Instance Selection Techniques for Memory-based Collaborative Filtering , 2002, SDM.

[69]  John F. Canny,et al.  Collaborative filtering with privacy via factor analysis , 2002, SIGIR '02.

[70]  Bin Shen,et al.  Discriminative parameter learning of general Bayesian network classifiers , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[71]  Luo Si,et al.  Flexible Mixture Model for Collaborative Filtering , 2003, ICML.

[72]  Benjamin M. Marlin,et al.  Modeling User Rating Profiles For Collaborative Filtering , 2003, NIPS.

[73]  Zoran Obradovic,et al.  Collaborative Filtering Using a Regression-Based Approach , 2003, Knowledge and Information Systems.

[74]  Russell Greiner,et al.  Learning a Model of a Web User's Interests , 2003, User Modeling.

[75]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[76]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[77]  Craig Boutilier,et al.  VDCBPI: an Approximate Scalable Algorithm for Large POMDPs , 2004, NIPS.

[78]  Ingoo Han,et al.  Improving the prediction performance of customer behavior through multiple imputation , 2004, Intell. Data Anal..

[79]  John Riedl,et al.  Shilling recommender systems for fun and profit , 2004, WWW '04.

[80]  Bradley N. Miller,et al.  PocketLens: Toward a personal recommender system , 2004, TOIS.

[81]  Thomas Hofmann,et al.  Latent semantic models for collaborative filtering , 2004, TOIS.

[82]  Marko Balabanovic,et al.  Exploring Versus Exploiting when Learning User Models for Text Recommendation , 2004, User Modeling and User-Adapted Interaction.

[83]  Manfred K. Warmuth,et al.  Learning Binary Relations Using Weighted Majority Voting , 1995, Machine Learning.

[84]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[85]  Benjamin M. Marlin,et al.  Collaborative Filtering: A Machine Learning Perspective , 2004 .

[86]  Robin D. Burke,et al.  Hybrid Recommender Systems: Survey and Experiments , 2002, User Modeling and User-Adapted Interaction.

[87]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[88]  Hans-Peter Kriegel,et al.  Ieee Transactions on Knowledge and Data Engineering Probabilistic Memory-based Collaborative Filtering , 2022 .

[89]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[90]  Michael J. Pazzani,et al.  Learning and Revising User Profiles: The Identification of Interesting Web Sites , 1997, Machine Learning.

[91]  Hsinchun Chen,et al.  Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering , 2004, TOIS.

[92]  Lars Schmidt-Thieme,et al.  Taxonomy-driven computation of product recommendations , 2004, CIKM '04.

[93]  Neil J. Hurley,et al.  Collaborative recommendation: A robustness analysis , 2004, TOIT.

[94]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[95]  Daniel Lemire,et al.  Scale and Translation Invariant Collaborative Filtering Systems , 2004, Information Retrieval.

[96]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[97]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[98]  Jonathan L. Herlocker,et al.  A collaborative filtering algorithm and evaluation metric that accurately model the user experience , 2004, SIGIR '04.

[99]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[100]  Richard S. Zemel,et al.  The multiple multiplicative factor model for collaborative filtering , 2004, ICML.

[101]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[102]  Hsinchun Chen,et al.  A graph model for E-commerce recommender systems , 2004, J. Assoc. Inf. Sci. Technol..

[103]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[104]  Michael J. Pazzani,et al.  A Framework for Collaborative, Content-Based and Demographic Filtering , 1999, Artificial Intelligence Review.

[105]  Byeong Man Kim,et al.  Probabilistic Model Estimation for Collaborative Filtering Based on Items Attributes , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[106]  Robin Burke,et al.  Effective Attack Models for Shilling Item-Based Collaborative Filtering Systems , 2005 .

[107]  Daniel Lemire,et al.  Slope One Predictors for Online Rating-Based Collaborative Filtering , 2007, SDM.

[108]  Xiaoyuan Su,et al.  Query size estimation using clustering techniques , 2005, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05).

[109]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[110]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[111]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[112]  Qiang Yang,et al.  Scalable collaborative filtering using cluster-based smoothing , 2005, SIGIR '05.

[113]  Bin Shen,et al.  Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.

[114]  Sean M. McNee,et al.  Accurate is not always good: How Accuracy Metrics have hurt Recommender Systems , 2006 .

[115]  Taghi M. Khoshgoftaar,et al.  Multiple Imputation of Software Measurement Data: A Case Study , 2006, SEKE.

[116]  Korris Fu-Lai Chung,et al.  Knowledge and Information Systems , 2017 .

[117]  Dennis DeCoste,et al.  Collaborative prediction using ensembles of Maximum Margin Matrix Factorizations , 2006, ICML.

[118]  Taghi M. Khoshgoftaar,et al.  Collaborative Filtering for Multi-class Data Using Belief Nets Algorithms , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[119]  Sean M. McNee,et al.  Being accurate is not enough: how accuracy metrics have hurt recommender systems , 2006, CHI Extended Abstracts.

[120]  Daniel Nikovski,et al.  Induction of compact decision trees for personalized recommendation , 2006, SAC.

[121]  Xiaoyuan Su,et al.  Hybrid Collaborative Filtering Algorithms Using a Mixture of Experts , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[122]  Yehuda Koren,et al.  Improved Neighborhood-based Collaborative Filtering , 2007 .

[123]  James Bennett,et al.  The Netflix Prize , 2007 .

[124]  Taghi M. Khoshgoftaar,et al.  Imputation-boosted collaborative filtering using machine learning classifiers , 2008, SAC '08.

[125]  Domonkos Tikk,et al.  Investigation of Various Matrix Factorization Methods for Large Recommender Systems , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[126]  Stephen E. Robertson,et al.  Probabilistic relevance ranking for collaborative filtering , 2008, Information Retrieval.

[127]  Yehuda Koren Tutorial on recent progress in collaborative filtering , 2008, RecSys '08.

[128]  Jun Wang,et al.  Unified relevance models for rating prediction in collaborative filtering , 2008, TOIS.

[129]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[130]  Taghi M. Khoshgoftaar,et al.  A Mixture Imputation-Boosted Collaborative Filter , 2008, FLAIRS.