Probabilistic Community Discovery Using Hierarchical Latent Gaussian Mixture Model

Complex networks exist in a wide array of diverse domains, ranging from biology, sociology, and computer science. These real-world networks, while disparate in nature, often comprise of a set of loose clusters(a.k.a communities), whose members are better connected to each other than to the rest of the network. Discovering such inherent community structures can lead to deeper understanding about the networks and therefore has raised increasing interests among researchers from various disciplines. This paper describes GWN-LDA (Generic weighted network-Latent Dirichlet Allocation) model, a hierarchical Bayesian model derived from the widely-received LDA model, for discovering probabilistic community profiles in social networks. In this model, communities are modeled as latent variables and defined as distributions over the social actor space. In addition, each social actor belongs to every community with different probability. This paper also proposes two different network encoding approaches and explores the impact of these two approaches to the community discovery performance. This model is evaluated on two research collaborative networks: CiteSeer and NanoSCI. The experimental results demonstrate that this approach is promising for discovering community structures in large-scale networks.

[1]  John Scott Social Network Analysis , 1988 .

[2]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Xiang Ji,et al.  Topic evolution and social interactions: how authors effect research , 2006, CIKM '06.

[5]  Weixiong Zhang,et al.  Identification and Evaluation of Weak Community Structures in Networks , 2006, AAAI.

[6]  Dennis M. Wilkinson,et al.  A method for finding communities of related genes , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[8]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Hongyuan Zha,et al.  Probabilistic models for discovering e-communities , 2006, WWW '06.

[11]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[12]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[13]  Jesse A. Stump,et al.  The structure and infrastructure of the global nanotechnology literature , 2006 .

[14]  Peter Jehlar,et al.  Identification of Web Communities , 2003 .

[15]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Luo Si,et al.  Adjusting Mixture Weights of Gaussian Mixture Model via Regularized Probabilistic Latent Semantic Analysis , 2005, PAKDD.

[17]  Victor R. Lesser,et al.  A Multi-Agent Approach for Peer-to-Peer Based Information Retrieval System , 2004, AAMAS.

[18]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[19]  John Yen,et al.  An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks , 2007, 2007 IEEE Intelligence and Security Informatics.

[20]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.