Modeling Network with Topic Model and Triangle Motif

This paper describes a hierarchical model based on triangle motif and topic model, considering both network data and node attribute. The attribute of nodes we study here is text, so we choose document network as our research content. We represent the document network with triangle motif, which has good scalability on large amount of data. This representation makes the complexity of our approach grows linearly in the number of documents, and more relational with the max degree of the network. We extend hLDA by incorporating network data, remodeling the hLDA. Using non-parametric Bayesian model, our approach does not need pre-specification of the branch factor at each non-terminal. The model is suitable for large-scale network of academic abstract, web document and related news.

[1]  Dragomir R. Radev,et al.  The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics , 2008, LREC.

[2]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[3]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[4]  Hans-Peter Kriegel,et al.  Infinite Hidden Relational Models , 2006, UAI.

[5]  Michal Rosen-Zvi,et al.  Latent Topic Models for Hypertext , 2008, UAI.

[6]  Yizhou Sun,et al.  SHRINK: a structural clustering algorithm for detecting hierarchical communities in networks , 2010, CIKM.

[7]  Katherine A. Heller,et al.  Bayesian hierarchical clustering , 2005, ICML.

[8]  Daniel S. Weld,et al.  Automatically refining the wikipedia infobox ontology , 2008, WWW.

[9]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[10]  Tom A. B. Snijders,et al.  Markov Chain Monte Carlo Estimation of Exponential Random Graph Models , 2002, J. Soc. Struct..

[11]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[12]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[13]  David M. Blei,et al.  Hierarchical relational models for document networks , 2009, 0909.4331.

[14]  Yan Liu,et al.  Topic-link LDA: joint models of topic and author community , 2009, ICML '09.

[15]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[16]  Christopher D. Manning,et al.  Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers , 2011, IJCNLP.

[17]  M. Newman,et al.  Why social networks are different from other types of networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Sean Gerrish,et al.  A Language-based Approach to Measuring Scholarly Impact , 2010, ICML.

[19]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[20]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[21]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[22]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[23]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[25]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[26]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[27]  Volker Tresp,et al.  Nonparametric Relational Learning for Social Network Analysis , 2008 .

[28]  Daniel Kifer,et al.  Context-aware citation recommendation , 2010, WWW '10.

[29]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[31]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[32]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[33]  P. Holland,et al.  Local Structure in Social Networks , 1976 .

[34]  C. Elkan,et al.  Topic Models , 2008 .

[35]  David R. Karger,et al.  Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections , 2017, SIGF.

[36]  Le Song,et al.  A Multiscale Community Blockmodel for Network Exploration , 2011, AISTATS.

[37]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[38]  Michael I. Jordan,et al.  Tree-Structured Stick Breaking for Hierarchical Data , 2010, NIPS.

[39]  Ramesh Nallapati,et al.  TopicFlow Model: Unsupervised Learning of Topic-specific Influences of Hyperlinked Documents , 2011, AISTATS.

[40]  Daniel Jurafsky,et al.  Who should I cite: learning literature search models from citation behavior , 2010, CIKM.

[41]  Kathleen McKeown,et al.  A Hierarchical Model of Web Summaries , 2011, ACL.