The Application of New Concepts of Dissimilarities between Nodes of a Graph to Collaborative Filtering

This work presents some general procedures for computing dissimilarities between elements of a database or, more generally, nodes of a weighted, undirected, graph. It is based on a Markov-chain model of random walk through the database. The model assigns transition probabilities to the links between elements, so that a random walker can jump from element to element. A quantity, called the average first-passage time, computes the average number of steps needed by a random walker for reaching element k for the first time, when starting from element i. A closely related quantity, called the average commute time, provides a distance measure between any pair of elements. These quantities, representing dissimilarities between any two elements, have the nice property of decreasing when the number of paths connecting two elements increases and when the "length" of any path decreases. The model is applied on a collaborative filtering task.

[1]  F. Göbel,et al.  Random walks on graphs , 1974 .

[2]  Jiming Liu,et al.  Extended latent class models for collaborative recommendation , 2004, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[3]  Emanuel Parzen,et al.  Stochastic Processes , 1962 .

[4]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[5]  Adi Ben-Israel,et al.  Generalized inverses: theory and applications , 1974 .

[6]  John Riedl,et al.  Recommender Systems for Large-scale E-Commerce : Scalable Neighborhood Formation Using Clustering , 2002 .

[7]  D. Vere-Jones Markov Chains , 1972, Nature.

[8]  Prabhakar Raghavan,et al.  The electrical resistance of a graph captures its commute and cover times , 1989, STOC '89.

[9]  Peter G. Doyle,et al.  Random Walks and Electric Networks: REFERENCES , 1987 .

[10]  Richard W. Madsen,et al.  Markov Chains: Theory and Applications , 1976 .

[11]  A. Zinober Matrices: Methods and Applications , 1992 .

[12]  Pierre Baldi,et al.  Modeling the Internet and the Web: Probabilistic Methods and Algorithms: Baldi/Probabilistic , 2002 .

[13]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[14]  Bernice W. Polemis Nonparametric Statistics for the Behavioral Sciences , 1959 .

[15]  François Fouss,et al.  The Principal Components Analysis of a Graph, and Its Relationships to Spectral Clustering , 2004, ECML.

[16]  A. B. Rami Shani,et al.  Matrices: Methods and Applications , 1992 .

[17]  John G. Kemeny,et al.  Finite Markov Chains. , 1960 .

[18]  John G. Kemeny,et al.  Finite Markov chains , 1960 .

[19]  M. Randic,et al.  Resistance distance , 1993 .

[20]  Frank Harary,et al.  Distance in graphs , 1990 .

[21]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[22]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[23]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .