NEIWalk: Community Discovery in Dynamic Content-Based Networks

Recently, discovering dynamic communities has become an increasingly important task. Many algorithms have been proposed, most of which only use linkage structure. However, rich information is encoded in the content of social networks such as node content and edge content, which is essential to discover topically meaningful communities. Therefore, to detect both structurally and topically meaningful communities, linkage structure, node content and edge content should be integrated. The main challenge lies in how to integrate them dynamically in a seamless way. This paper proposes a novel transformation of content-based network into a Node-Edge Interaction (NEI) network where linkage structure, node content and edge content are embedded seamlessly. A differential activity based approach is proposed to incrementally maintain the NEI network as the content-based network evolves. To capture the semantic effect of different edge types, a transition probability matrix is devised for the NEI network. Based on this, heterogeneous random walk is applied to discover dynamic communities, leading to a new dynamic community detection method termed NEIWalk (NEI network based random Walk). Theoretical analysis shows that the proposed NEIWalk method gets a bounded accuracy loss due to the random walk sampling. Experimental results demonstrate the effectiveness and efficiency of NEIWalk.

[1]  Mo Chen,et al.  Clustering via Random Walk Hitting Time on Directed Graphs , 2008, AAAI.

[2]  Victor Muntés-Mulero,et al.  Overlapping Community Search for social networks , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[3]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[4]  L. Venkata Subramaniam,et al.  Using content and interactions for discovering communities in social networks , 2012, WWW.

[5]  M. Cugmas,et al.  On comparing partitions , 2015 .

[6]  Philip S. Yu,et al.  Dynamic Community Detection in Weighted Graph Streams , 2013, SDM.

[7]  Philip S. Yu,et al.  Integrating meta-path selection with user-guided object clustering in heterogeneous information networks , 2012, KDD.

[8]  Hui Xiong,et al.  Adapting the right measures for K-means clustering , 2009, KDD.

[9]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[10]  Mohammed J. Zaki,et al.  Mining Attribute-structure Correlated Patterns in Large Attributed Graphs , 2012, Proc. VLDB Endow..

[11]  Le Song,et al.  Dynamic mixed membership blockmodel for evolving networks , 2009, ICML '09.

[12]  Jiawei Han,et al.  A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks , 2009, Proc. VLDB Endow..

[13]  Bart Selman,et al.  Tracking evolving communities in large linked networks , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[15]  Chang-Dong Wang,et al.  A Conscience On-line Learning Approach for Kernel-Based Clustering , 2010, 2010 IEEE International Conference on Data Mining.

[16]  Huan Liu,et al.  Community evolution in dynamic multi-mode networks , 2008, KDD.

[17]  Hong Zhou,et al.  Geometry-Based Edge Clustering for Graph Visualization , 2008, IEEE Transactions on Visualization and Computer Graphics.

[18]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[19]  U. V. Luxburg,et al.  Getting lost in space: large sample analysis of the commute distance , 2010, NIPS 2010.

[20]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[21]  Yun Chi,et al.  Facetnet: a framework for analyzing communities and their evolutions in dynamic networks , 2008, WWW.

[22]  Chang-Dong Wang,et al.  Multi-Exemplar Affinity Propagation , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[24]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[25]  David J. Aldous,et al.  Lower bounds for covering times for reversible Markov chains and random walks on graphs , 1989 .

[26]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[27]  Chang-Dong Wang,et al.  SVStream: A Support Vector-Based Algorithm for Clustering Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[28]  Charu C. Aggarwal,et al.  Community Detection with Edge Content in Social Media Networks , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[29]  Yan Liu,et al.  Topic-link LDA: joint models of topic and author community , 2009, ICML '09.

[30]  T.S.Evans,et al.  Line graphs of weighted networks for overlapping communities , 2009, 0912.4389.

[31]  A. Barabasi,et al.  Quantifying social group evolution , 2007, Nature.

[32]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[33]  Huan Liu,et al.  Discovering Overlapping Groups in Social Media , 2010, 2010 IEEE International Conference on Data Mining.

[34]  David Harel,et al.  On Clustering Using Random Walks , 2001, FSTTCS.

[35]  Yihong Gong,et al.  Detecting communities and their evolutions in dynamic social networks—a Bayesian approach , 2011, Machine Learning.

[36]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[37]  Yun Chi,et al.  Analyzing communities and their evolutions in dynamic social networks , 2009, TKDD.

[38]  Sihem Amer-Yahia,et al.  Challenges in Searching Online Communities , 2007, IEEE Data Eng. Bull..

[39]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[40]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[41]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[42]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[43]  S. Dongen Graph clustering by flow simulation , 2000 .

[44]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[45]  Charu C. Aggarwal,et al.  On Node Classification in Dynamic Content-based Networks , 2011, SDM.

[46]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[47]  Huan Liu,et al.  Identifying Evolving Groups in Dynamic Multimode Networks , 2012, IEEE Transactions on Knowledge and Data Engineering.

[48]  Srinivasan Parthasarathy,et al.  An event-based framework for characterizing the evolutionary behavior of interaction graphs , 2007, KDD '07.

[49]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[50]  Aoying Zhou,et al.  Tracking clusters in evolving data streams over sliding windows , 2008, Knowledge and Information Systems.

[51]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[52]  Dino Pedreschi,et al.  A classification for community discovery methods in complex networks , 2011, Stat. Anal. Data Min..

[53]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[54]  Yihong Gong,et al.  Combining content and link for classification using matrix factorization , 2007, SIGIR.