Feature Weighting and Instance Selection for Collaborative Filtering: An Information-Theoretic Approach*

Abstract.Collaborative filtering (CF) employing a consumer preference database to make personal product recommendations is achieving widespread success in E-commerce. However, it does not scale well to the ever-growing number of consumers. The quality of the recommendation also needs to be improved in order to gain more trust from consumers. This paper attempts to improve the accuracy and efficiency of collaborative filtering. We present a unified information-theoretic approach to measure the relevance of features and instances. Feature weighting and instance selection methods are proposed for collaborative filtering. The proposed methods are evaluated on the well-known EachMovie data set and the experimental results demonstrate a significant improvement in accuracy and efficiency.

[1]  G. Deco,et al.  An Information-Theoretic Approach to Neural Computing , 1997, Perspectives in Neural Computing.

[2]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[3]  Hans-Peter Kriegel,et al.  A Database Interface for Clustering in Large Spatial Databases , 1995, KDD.

[4]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[5]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[6]  John G. Hughes,et al.  Knowledge Intensive Exception Spaces , 1998, AAAI/IAAI.

[7]  Mykola Galushka,et al.  Towards Dynamic Maintenance of Retrieval Knowledge in CBR , 2002, FLAIRS.

[8]  Huan Liu,et al.  Instance Selection and Construction for Data Mining , 2001 .

[9]  Steven Salzberg,et al.  A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features , 2004, Machine Learning.

[10]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[11]  Barry Smyth,et al.  Footprint-Based Retrieval , 1999, ICCBR.

[12]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification , 1995, SSD.

[13]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[14]  Mark Rosenstein,et al.  Recommending and evaluating choices in a virtual community of use , 1995, CHI '95.

[15]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[16]  Pedro M. Domingos,et al.  Unifying Instance-Based and Rule-Based Induction , 1996 .

[17]  Kai Yu,et al.  Feature weighting and instance selection for collaborative filtering , 2001, 12th International Workshop on Database and Expert Systems Applications.

[18]  Michael J. Pazzani,et al.  Learning Collaborative Information Filters , 1998, ICML.

[19]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[20]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[21]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[22]  Jianping Zhang,et al.  Selecting Typical Instances in Instance-Based Learning , 1992, ML.

[23]  Steven Salzberg,et al.  A Nearest Hyperrectangle Learning Method , 1991, Machine Learning.

[24]  Thomas G. Dietterich,et al.  An Experimental Comparison of the Nearest-Neighbor and Nearest-Hyperrectangle Algorithms , 1995, Machine Learning.

[25]  James Cussens Bayes and Pseudo-Bayes Estimates of Conditional Probabilities and Their Reliability , 1993, ECML.

[26]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[27]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[28]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[29]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[30]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[31]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[32]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.