Community Detection Based on Links and Node Features in Social Networks

Community detection is a significant but challenging task in the field of social network analysis. Many effective methods have been proposed to solve this problem. However, most of them are mainly based on the topological structure or node attributes. In this paper, based on SPAEM [1], we propose a joint probabilistic model to detect community which combines node attributes and topological structure. In our model, we create a novel feature-based weighted network, within which each edge weight is represented by the node feature similarity between two nodes at the end of the edge. Then we fuse the original network and the created network with a parameter and employ expectation-maximization algorithm (EM) to identify a community. Experiments on a diverse set of data, collected from Facebook and Twitter, demonstrate that our algorithm has achieved promising results compared with other algorithms.

[1]  David Cohn,et al.  Learning to Probabilistically Identify Authoritative Documents , 2000, ICML.

[2]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Charles L.A. Clarke,et al.  SIGIR '07, Amsterdam : proceedings : 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 23-27, 2007, Amsterdam, the Netherlands , 2007 .

[4]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[5]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[6]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[7]  Alexei Vazquez,et al.  Population stratification using a statistical model on hypergraphs , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[10]  Charu C. Aggarwal,et al.  Community Detection with Edge Content in Social Media Networks , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[11]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[12]  Srinivasan Parthasarathy,et al.  Efficient community detection in large networks using content and links , 2012, WWW.

[13]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[16]  Stanford,et al.  Learning to Discover Social Circles in Ego Networks , 2012 .

[17]  Wei Ren,et al.  Simple probabilistic algorithm for detecting community structure. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[19]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[21]  E A Leicht,et al.  Mixture models and exploratory analysis in networks , 2006, Proceedings of the National Academy of Sciences.

[22]  Yihong Gong,et al.  Combining content and link for classification using matrix factorization , 2007, SIGIR.

[23]  J. Ramasco,et al.  Inversion method for content-based networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.