A Discriminative Topic Model using Document Network Structure

Document collections often have links between documents—citations, hyperlinks, or revisions—and which links are added is often based on topical similarity. To model these intuitions, we introduce a new topic model for documents situated within a network structure, integrating latent blocks of documents with a max-margin learning criterion for link prediction using topicand word-level features. Experiments on a scientific paper dataset and collection of webpages show that, by more robustly exploiting the rich link structure within a document network, our model improves link prediction, topic quality, and block distributions.

[1]  Xiaojin Zhu,et al.  A Topic Model for Word Sense Disambiguation , 2007, EMNLP.

[2]  Roger Guimerà,et al.  A Network Inference Method for Large-Scale Unsupervised Identification of Novel Drug-Drug Interactions , 2013, PLoS Comput. Biol..

[3]  Lynne M Connelly,et al.  Fisher's Exact Test. , 2016, Medsurg nursing : official journal of the Academy of Medical-Surgical Nurses.

[4]  Philip Resnik,et al.  Birds of a Feather Linked Together: A Discriminative Topic Model using Link-based Priors , 2015, EMNLP.

[5]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models , 2012, J. Mach. Learn. Res..

[7]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[8]  Andrew McCallum,et al.  Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[9]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[10]  Daniel B. Larremore,et al.  Efficiently inferring community structure in bipartite networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Snigdha Chaturvedi,et al.  A Topical Graph Kernel for Link Prediction in Labeled Graphs , 2012 .

[13]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[14]  Philip J. Cowans Probabilistic Document Modelling , 2006 .

[15]  Vladimir Eidelman,et al.  Polylingual Tree-Based Topic Models for Translation Domain Adaptation , 2014, ACL.

[16]  Jure Leskovec,et al.  Latent Multi-group Membership Graph Model , 2012, ICML.

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Ning Chen,et al.  Gibbs max-margin topic models with data augmentation , 2013, J. Mach. Learn. Res..

[19]  Timothy Baldwin,et al.  Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality , 2014, EACL.

[20]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[21]  Hanna Wallach,et al.  Structured Topic Models for Language , 2008 .

[22]  Eduardo D. Sontag,et al.  Using Fourier-neural recurrent networks to fit sequential input/output data , 1997, Neurocomputing.

[23]  Thomas L. Griffiths,et al.  Unsupervised Topic Modelling for Multi-Party Spoken Discourse , 2006, ACL.

[24]  Aaron Clauset,et al.  Learning Latent Block Structure in Weighted Networks , 2014, J. Complex Networks.

[25]  Jordan Boyd-Graber,et al.  Online Latent Dirichlet Allocation with Infinite Vocabulary , 2013, ICML.

[26]  Eduardo D. Sontag,et al.  For neural networks, function determines form , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[27]  Philip Resnik,et al.  Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation , 2010, EMNLP.

[28]  Benjamin M. Marlin,et al.  Modeling User Rating Profiles For Collaborative Filtering , 2003, NIPS.

[29]  Hal Daumé,et al.  Markov Random Topic Fields , 2009, ACL/IJCNLP.

[30]  Viet-An Nguyen,et al.  Lexical and Hierarchical Topic Regression , 2013, NIPS.

[31]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[32]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[33]  Robert C. Moore On Log-Likelihood-Ratios and the Significance of Rare Events , 2004, EMNLP.

[34]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[35]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[36]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[37]  David M. Blei,et al.  Hierarchical relational models for document networks , 2009, 0909.4331.