Abstract : It is appealing to imagine software packages that provide personally tailored product recommendations to a consumer. One way to predict the rating of a particular product by a particular consumer is through inference from a database of previous ratings by many consumers of many products. Such a database consists of triplets of the forms: (product-identifier, consumer-identifier, rating). Generally such databases will be sparse, but nevertheless we may hope to derive considerable predictive information from them. A number of groups have begun developing distributed systems to collect and predict consumer preferences. Some have put significant effort into implementation issues to do with user interfaces, and the gathering and communicating of data via Internet and Usenet. Rather that launching into the development of a distributed system to address a particular consumer preference domain, our goal is to first understand the computational and statistical nature of e general problem. In this paper we develop two algorithms for is purpose and also relate them to a nearest-neighbor based algorithm of Resnick et al., 1994. We then examine eir predictive performance and quality of recommendations on a number of synthetic and real-world databases. The real-world results suggest that a significant improvement can be obtained over simply recommending the most popular product in some but not all domains. At the end of the paper we discuss computational expense on large databases, the use of explicit features, and our ideas for improved inference algorithms.
[1]
M. Stone.
Cross‐Validatory Choice and Assessment of Statistical Predictions
,
1976
.
[2]
G. Wahba,et al.
A completely automatic french curve: fitting spline functions by cross validation
,
1975
.
[3]
Jon Louis Bentley,et al.
Multidimensional divide-and-conquer
,
1980,
CACM.
[4]
George E. P. Box,et al.
Empirical Model‐Building and Response Surfaces
,
1988
.
[5]
V. Rich.
Personal communication
,
1989,
Nature.
[6]
G. Box,et al.
Empirical Model-Building and Response Surfaces.
,
1990
.
[7]
David A. Cohn,et al.
Active Learning with Statistical Models
,
1996,
NIPS.
[8]
John Riedl,et al.
GroupLens: an open architecture for collaborative filtering of netnews
,
1994,
CSCW '94.
[9]
Andrew W. Moore,et al.
Efficient Algorithms for Minimizing Cross Validation Error
,
1994,
ICML.
[10]
Pattie Maes,et al.
Social information filtering: algorithms for automating “word of mouth”
,
1995,
CHI '95.