Collaborative topic modeling for recommending scientific articles

Researchers have access to large online archives of scientific articles. As a consequence, finding relevant papers has become more difficult. Newly formed online communities of researchers sharing citations provides a new way to solve this problem. In this paper, we develop an algorithm to recommend scientific articles to users of an online community. Our approach combines the merits of traditional collaborative filtering and probabilistic topic modeling. It provides an interpretable latent structure for users and items, and can form recommendations about both existing and newly published articles. We study a large subset of data from CiteULike, a bibliography sharing service, and show that our algorithm provides a more effective recommender system than traditional collaborative filtering.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[5]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[6]  John Riedl,et al.  Combining Collaborative Filtering with Personal Agents for Better Recommendations , 1999, AAAI/IAAI.

[7]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[8]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[9]  Loriene Roy,et al.  Content-based book recommending using learning for text categorization , 1999, DL '00.

[10]  Mark Claypool,et al.  Implicit interest indicators , 2001, IUI '01.

[11]  Raymond J. Mooney,et al.  Content-boosted collaborative filtering for improved recommendations , 2002, AAAI/IAAI.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  J. Ioannidis Why Most Published Research Findings Are False , 2005 .

[14]  Bernardo A. Huberman,et al.  The Structure of Collaborative Tagging Systems , 2005, ArXiv.

[15]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[16]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[17]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[18]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[19]  N. Logothetis,et al.  Phase-of-Firing Coding of Natural Visual Stimuli in Primary Visual Cortex , 2008, Current Biology.

[20]  Qiang Yang,et al.  One-Class Collaborative Filtering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[21]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[22]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[23]  C. Elkan,et al.  Topic Models , 2008 .

[24]  Deepak Agarwal,et al.  Regression-based latent factor models , 2009, KDD.

[25]  Yihong Gong,et al.  Large-scale collaborative prediction using a nonparametric random effects model , 2009, ICML '09.

[26]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[27]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[28]  Arindam Banerjee,et al.  Generalized Probabilistic Matrix Factorizations for Collaborative Filtering , 2010, 2010 IEEE International Conference on Data Mining.

[29]  David B. Dunson,et al.  Joint Analysis of Time-Evolving Binary Matrices and Associated Documents , 2010, NIPS.

[30]  Deepak Agarwal,et al.  fLDA: matrix factorization through latent dirichlet allocation , 2010, WSDM '10.

[31]  Sean Gerrish,et al.  Predicting Legislative Roll Calls from Text , 2011, ICML.