Remodeling the network for microgroup detection on microblog

In this paper, we focus on the problem of community detection on Sina weibo, the most popular microblogging system in China. By characterizing the structure and content of microgroup (community) on Sina weibo in detail, we observe that different from ordinary social networks, the degree assortativity coefficients are negative on most microgroups. In addition, we find that users from the same microgroup tend to share some common attributes (e.g., followers, tags) and interests extracted from their published posts. Inspired by these new findings, we propose a united method to remodel the network for microgroup detection while maintaining the information of link structure and user content. Firstly, the link direction is concerned by assigning greater weight values to more surprising links, while the content similarity is measured by the Jaccard coefficient of common features and interest similarity based on Latent Dirichlet Allocation model. Then, both link direction and content similarity between two users are uniformly converted to the edge weight of a new remodeled network, which is undirected and weighted. Finally, multiple frequently used community detection algorithms that support weighted networks could be employed. Extensive experiments on real-world social networks show that both link structure and user content play almost equally important roles in microgroup detection on Sina weibo. Our method outperforms the traditional methods with average accuracy improvement up to 39 %, and the number of unrecognized users decreased by about 75 %.

[1]  Yihong Gong,et al.  Directed Network Community Detection: A Popularity and Productivity Link Model , 2010, SDM.

[2]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[4]  WuJunjie,et al.  Information propagation in online social networks: a tie-strength perspective , 2012 .

[5]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[6]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[7]  Gang Zhou,et al.  Microgroup Mining on TSina via Network Structure and User Attribute , 2011, ADMA.

[8]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[10]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[11]  Lei Wang,et al.  Learning with multi-resolution overlapping communities , 2013, Knowledge and Information Systems.

[12]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[13]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[14]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  F. Radicchi,et al.  Statistical significance of communities in networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  David M Levinson,et al.  Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering , 2009, Complex.

[17]  Hui Xiong,et al.  Information propagation in online social networks: a tie-strength perspective , 2011, Knowledge and Information Systems.

[18]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[19]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[20]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[21]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[23]  Jennifer Neville,et al.  Modeling relationship strength in online social networks , 2010, WWW '10.

[24]  Dominik Endres,et al.  A new metric for probability distributions , 2003, IEEE Transactions on Information Theory.

[25]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[26]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[28]  Mason A. Porter,et al.  Comparing Community Structure to Characteristics in Online Collegiate Social Networks , 2008, SIAM Rev..

[29]  Claudio Castellano,et al.  Community Structure in Graphs , 2007, Encyclopedia of Complexity and Systems Science.

[30]  Hongtao Lu,et al.  Finding communities in directed networks by PageRank random walk induced network embedding , 2010 .

[31]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Ben Taskar,et al.  Learning Probabilistic Models of Link Structure , 2003, J. Mach. Learn. Res..

[33]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[34]  Guojun Liu,et al.  A clique-superposition model for social networks , 2013, Science China Information Sciences.

[35]  Argyris Kalogeratos,et al.  Document clustering using synthetic cluster prototypes , 2011, Data Knowl. Eng..

[36]  David Lo,et al.  Mining indirect antagonistic communities from social interactions , 2012, Knowledge and Information Systems.

[37]  Flemming Topsøe,et al.  Some inequalities for information divergence and related measures of discrimination , 2000, IEEE Trans. Inf. Theory.

[38]  Alex Arenas,et al.  Synchronization reveals topological scales in complex networks. , 2006, Physical review letters.

[39]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[40]  Michal Rosen-Zvi,et al.  Latent Topic Models for Hypertext , 2008, UAI.

[41]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[43]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[44]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[45]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[46]  Ljupco Kocarev,et al.  Identifying communities by influence dynamics in social networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[47]  Youngdo Kim,et al.  Community Identification in Directed Networks , 2009, Complex.

[48]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[49]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[50]  Xiaogang Wang,et al.  A roadmap of clustering algorithms: finding a match for a biomedical application , 2008, Briefings Bioinform..

[51]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[52]  Krishna P. Gummadi,et al.  A measurement-driven analysis of information propagation in the flickr social network , 2009, WWW '09.

[53]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[54]  Zhengding Lu,et al.  Community mining on dynamic weighted directed graphs , 2009, CIKM-CNIKM.