An LDA-based Community Structure Discovery Approach for Large-Scale Social Networks

Community discovery has drawn significant research interests among researchers from many disciplines for its increasing application in multiple, disparate areas, including computer science, biology, social science and so on. This paper describes an LDA(latent Dirichlet Allocation)-based hierarchical Bayesian algorithm, namely SSN-LDA (simple social network LDA). In SSN-LDA, communities are modeled as latent variables in the graphical model and defined as distributions over the social actor space. The advantage of SSN-LDA is that it only requires topological information as input. This model is evaluated on two research collaborative networkst: CtteSeer and NanoSCI. The experimental results demonstrate that this approach is promising for discovering community structures in large-scale networks.

[1]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[2]  Dennis M. Wilkinson,et al.  A method for finding communities of related genes , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[4]  Morris H. Sunshine,et al.  Locating Leaders in Local Communities: A Comparison of Some Alternative Approaches , 1963 .

[5]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[6]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[7]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[8]  Ernst Fehr,et al.  A Social Network Analysis of Research Collaboration in the Economics Community , 2022 .

[9]  John Scott Social Network Analysis , 1988 .

[10]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[11]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[12]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[14]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[15]  Bart Selman,et al.  Tracking evolving communities in large linked networks , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[17]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  Xiang Ji,et al.  Topic evolution and social interactions: how authors effect research , 2006, CIKM '06.

[20]  Robert L. Goldstone,et al.  The simultaneous evolution of author and paper networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Thomas Krichel,et al.  A social network analysis of research collaboration in theeconomics community , 2006 .

[22]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Luo Si,et al.  Adjusting Mixture Weights of Gaussian Mixture Model via Regularized Probabilistic Latent Semantic Analysis , 2005, PAKDD.

[24]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[25]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[26]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[27]  Hongyuan Zha,et al.  Probabilistic models for discovering e-communities , 2006, WWW '06.

[28]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[29]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[30]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[31]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Jesse A. Stump,et al.  The structure and infrastructure of the global nanotechnology literature , 2006 .