Lazy Collaborative Filtering for Data Sets With Missing Values

As one of the biggest challenges in research on recommender systems, the data sparsity issue is mainly caused by the fact that users tend to rate a small proportion of items from the huge number of available items. This issue becomes even more problematic for the neighborhood-based collaborative filtering (CF) methods, as there are even lower numbers of ratings available in the neighborhood of the query item. In this paper, we aim to address the data sparsity issue in the context of neighborhood-based CF. For a given query (user, item), a set of key ratings is first identified by taking the historical information of both the user and the item into account. Then, an auto-adaptive imputation (AutAI) method is proposed to impute the missing values in the set of key ratings. We present a theoretical analysis to show that the proposed imputation method effectively improves the performance of the conventional neighborhood-based CF methods. The experimental results show that our new method of CF with AutAI outperforms six existing recommendation methods in terms of accuracy.

[1]  George Karypis,et al.  A Novel Approach to Compute Similarities and Its Application to Item Recommendation , 2010, PRICAI.

[2]  G. K. Bhattacharyya,et al.  Statistics: Principles and Methods , 1994 .

[3]  Thomas M. Cover,et al.  Estimation by the nearest neighbor rule , 1968, IEEE Trans. Inf. Theory.

[4]  Qiang Yang,et al.  Scalable collaborative filtering using cluster-based smoothing , 2005, SIGIR '05.

[5]  John Riedl,et al.  Recommender Systems for Large-scale E-Commerce : Scalable Neighborhood Formation Using Clustering , 2002 .

[6]  Taghi M. Khoshgoftaar,et al.  A Mixture Imputation-Boosted Collaborative Filter , 2008, FLAIRS.

[7]  Martha Larson,et al.  Exploiting user similarity based on rated-item pools for improved user-based collaborative filtering , 2009, RecSys '09.

[8]  Michael R. Lyu,et al.  Effective missing data prediction for collaborative filtering , 2007, SIGIR.

[9]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[10]  Zili Zhang,et al.  Missing Value Estimation for Mixed-Attribute Data Sets , 2011, IEEE Transactions on Knowledge and Data Engineering.

[11]  Yehuda Koren,et al.  Modeling relationships at multiple scales to improve accuracy of large recommender systems , 2007, KDD '07.

[12]  Taghi M. Khoshgoftaar,et al.  Collaborative Filtering for Multi-class Data Using Belief Nets Algorithms , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[13]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[14]  George Karypis,et al.  A Comprehensive Survey of Neighborhood-based Recommendation Methods , 2011, Recommender Systems Handbook.

[15]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[16]  Manfred K. Warmuth,et al.  Learning binary relations using weighted majority voting , 2004, Machine Learning.

[17]  Shichao Zhang,et al.  Shell-neighbor method and its application in missing data imputation , 2011, Applied Intelligence.

[18]  Jun Wang,et al.  Unifying user-based and item-based collaborative filtering approaches by similarity fusion , 2006, SIGIR.

[19]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[20]  Thomas Hofmann,et al.  Latent Class Models for Collaborative Filtering , 1999, IJCAI.

[21]  Naoki Abe,et al.  Collaborative Filtering Using Weighted Majority Prediction Algorithms , 1998, ICML.

[22]  Yehuda Koren,et al.  Lessons from the Netflix prize challenge , 2007, SKDD.

[23]  Ke Wang,et al.  RecTree: An Efficient Collaborative Filtering Method , 2001, DaWaK.

[24]  Jonathan L. Herlocker,et al.  A collaborative filtering algorithm and evaluation metric that accurately model the user experience , 2004, SIGIR '04.

[25]  Daniel Lemire,et al.  Slope One Predictors for Online Rating-Based Collaborative Filtering , 2007, SDM.

[26]  Xiaofeng Zhu,et al.  Missing data imputation by utilizing information within incomplete instances , 2011, J. Syst. Softw..

[27]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[28]  Yehuda Koren,et al.  Improved Neighborhood-based Collaborative Filtering , 2007 .

[29]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[30]  Luis M. de Campos,et al.  Measuring predictive capability in collaborative filtering , 2009, RecSys '09.

[31]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[32]  Boi Faltings,et al.  Rating aggregation in collaborative filtering systems , 2009, RecSys '09.

[33]  Michael J. Pazzani,et al.  Learning Collaborative Information Filters , 1998, ICML.

[34]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.