Learning multiple graphs for document recommendations

The Web offers rich relational data with different semantics. In this paper, we address the problem of document recommendation in a digital library, where the documents in question are networked by citations and are associated with other entities by various relations. Due to the sparsity of a single graph and noise in graph construction, we propose a new method for combining multiple graphs to measure document similarities, where different factorization strategies are used based on the nature of different graphs. In particular, the new method seeks a single low-dimensional embedding of documents that captures their relative similarities in a latent space. Based on the obtained embedding, a new recommendation framework is developed using semi-supervised learning on graphs. In addition, we address the scalability issue and propose an incremental algorithm. The new incremental method significantly improves the efficiency by calculating the embedding for new incoming documents only. The new batch and incremental methods are evaluated on two real world datasets prepared from CiteSeer. Experiments demonstrate significant quality improvement for our batch method and significant efficiency improvement with tolerable quality loss for our incremental method.

[1]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[2]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[3]  David Cohn,et al.  Learning to Probabilistically Identify Authoritative Documents , 2000, ICML.

[4]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[5]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[6]  Hongyuan Zha,et al.  Web document clustering using hyperlink structures , 2001 .

[7]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[8]  Stephen P. Boyd,et al.  Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices , 2003, Proceedings of the 2003 American Control Conference, 2003..

[9]  Thomas Hofmann,et al.  Latent semantic models for collaborative filtering , 2004, TOIS.

[10]  Hans-Peter Kriegel,et al.  Ieee Transactions on Knowledge and Data Engineering Probabilistic Memory-based Collaborative Filtering , 2022 .

[11]  Ramanathan V. Guha,et al.  Propagation of trust and distrust , 2004, WWW '04.

[12]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[13]  F. Chung Laplacians and the Cheeger Inequality for Directed Graphs , 2005 .

[14]  Bernhard Schölkopf,et al.  Learning from labeled and unlabeled data on a directed graph , 2005, ICML.

[15]  Jun Wang,et al.  Unifying user-based and item-based collaborative filtering approaches by similarity fusion , 2006, SIGIR.

[16]  Fei Wang,et al.  Recommendation on Item Graphs , 2006, Sixth International Conference on Data Mining (ICDM'06).

[17]  Hongyuan Zha,et al.  Probabilistic models for discovering e-communities , 2006, WWW '06.

[18]  Christopher J. C. Burges,et al.  Spectral clustering and transductive learning with multiple views , 2007, ICML '07.

[19]  Yihong Gong,et al.  Combining content and link for classification using matrix factorization , 2007, SIGIR.

[20]  Hongyuan Zha,et al.  Discovering Temporal Communities from Social Network Documents , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).