Topic Models Conditioned on Relations

Latent Dirichlet allocation is a fully generative statistical language model that has been proven to be successful in capturing both the content and the topics of a corpus of documents. Recently, it was even shown that relations among documents such as hyper-links or citations allow one to share information between documents and in turn to improve topic generation. Although fully generative, in many situations we are actually not interested in predicting relations among documents. In this paper, we therefore present a Dirichlet-multinomial nonparametric regression topic model that includes a Gaussian process prior on joint document and topic distributions that is a function of document relations. On networks of scientific abstracts and of Wikipedia documents we show that this approach meets or exceeds the performance of several baseline topic models.

[1]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[2]  Joshua B. Tenenbaum,et al.  Modelling Relational Data using Bayesian Clustered Tensor Factorization , 2009, NIPS.

[3]  Andrew McCallum,et al.  Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[4]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[5]  Kristian Kersting,et al.  Stacked Gaussian Process Learning , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[6]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[7]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Hans-Peter Kriegel,et al.  Infinite Hidden Relational Models , 2006, UAI.

[9]  Michal Rosen-Zvi,et al.  Latent Topic Models for Hypertext , 2008, UAI.

[10]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[11]  Zoubin Ghahramani,et al.  Graph Kernels by Spectral Transforms , 2006, Semi-Supervised Learning.

[12]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[13]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[14]  Lise Getoor,et al.  A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.

[15]  T. Griffiths,et al.  Probabilistic inference in human semantic memory , 2006, Trends in Cognitive Sciences.

[16]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[17]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[18]  Kristian Kersting,et al.  Multi-Relational Learning with Gaussian Processes , 2009, IJCAI.

[19]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[22]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[23]  C. Elkan,et al.  Topic Models , 2008 .

[24]  Wei Chu,et al.  Gaussian Process Models for Link Analysis and Transfer Learning , 2007, NIPS.

[25]  Andrew McCallum,et al.  Expertise modeling for matching papers with reviewers , 2007, KDD '07.

[26]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[27]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[28]  Aleks Jakulin,et al.  Applying Discrete PCA in Data Analysis , 2004, UAI.

[29]  Zhihua Zhang,et al.  Probabilistic Relational PCA , 2009, NIPS.

[30]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[31]  Ramesh Nallapati,et al.  Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs , 2021, ICWSM.

[32]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[33]  Wei Chu,et al.  Hidden Common Cause Relations in Relational Learning , 2007, NIPS.

[34]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[35]  Wei Chu,et al.  Relational Learning with Gaussian Processes , 2006, NIPS.

[36]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.