Graphical models based hierarchical probabilistic community discovery in large-scale social networks

Real-world social networks, while disparate in nature, often comprise of a set of loose clusters (a.k.a. communities), in which members are better connected to each other than to the rest of the network. In addition, such communities are often hierarchical, reflecting the fact that some communities are composed of a few smaller, sub-communities. Discovering the complicated hierarchical community structure can gain us deeper understanding about the networks and the pertaining communities. This paper describes a hierarchical Bayesian model based scheme namely hierarchical social network-pachinko allocation model (HSN-PAM), for discovering probabilistic, hierarchical communities in social networks. This scheme is powered by a previously developed hierarchical Bayesian model. In this scheme, communities are classified into two categories: super-communities and regular-communities. Two different network encoding approaches are explored to evaluate this scheme on research collaborative networks, including CiteSeer. The experimental results demonstrate that HSN-PAM is effective for discovering hierarchical community structures in large-scale social networks.

[1]  Ziv Bar-Yossef,et al.  Cluster ranking with an application to mining mailbox networks , 2006, Sixth International Conference on Data Mining (ICDM'06).

[2]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[3]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[5]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[6]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[7]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[8]  Bart Selman,et al.  Tracking evolving communities in large linked networks , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Michael K. Ng,et al.  SMART: a subspace clustering algorithm that automatically identifies the appropriate number of clusters , 2009, Int. J. Data Min. Model. Manag..

[10]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[12]  Luo Si,et al.  Adjusting Mixture Weights of Gaussian Mixture Model via Regularized Probabilistic Latent Semantic Analysis , 2005, PAKDD.

[13]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[14]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  John Yen,et al.  Probabilistic Community Discovery Using Hierarchical Latent Gaussian Mixture Model , 2007, AAAI.

[17]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  John Scott Social Network Analysis , 1988 .

[19]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[20]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[21]  Ernst Fehr,et al.  A Social Network Analysis of Research Collaboration in the Economics Community , 2022 .

[22]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[23]  John Yen,et al.  An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks , 2007, 2007 IEEE Intelligence and Security Informatics.

[24]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[26]  Dennis M. Wilkinson,et al.  A method for finding communities of related genes , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[28]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[29]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[30]  Robert L. Goldstone,et al.  The simultaneous evolution of author and paper networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Thomas Krichel,et al.  A social network analysis of research collaboration in theeconomics community , 2006 .

[32]  Mark S. Ackerman,et al.  Expertise networks in online communities: structure and algorithms , 2007, WWW '07.