Community Detection Based on Structure and Content: A Content Propagation Perspective

With the recent advances in information networks, the problem of identifying group structure or communities has received a significant amount of attention. Most of the existing principles of community detection or clustering mainly focus on either the topological structure of a network or the node attributes separately, while both of the two aspects provide valuable information to characterize the nature of communities. In this paper we combine the topological structure of a network as well as the content information of nodes in the task of detecting communities in information networks. Specifically, we treat a network as a dynamic system and consider its community structure as a consequence of interactions among nodes. To model the interactions we introduce the principle of content propagation and integrate the aspects of structure and content in a network naturally. We further describe the interactions among nodes in two different ways, including a linear model to approximate influence propagation, and modeling the interactions directly with random walk. Based on interaction modeling, the nature of communities is described by analyzing the stable status of the dynamic system. Extensive experimental results on benchmark datasets demonstrate the superiority of the proposed framework over the state of the art.

[1]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[2]  Hui Xiong,et al.  PageRank with Priors: An Influence Propagation Perspective , 2013, IJCAI.

[3]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[4]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[5]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[6]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Srinivasan Parthasarathy,et al.  Efficient community detection in large networks using content and links , 2012, WWW.

[8]  Mark S. Granovetter Threshold Models of Collective Behavior , 1978, American Journal of Sociology.

[9]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[10]  Jacob Goldenberg,et al.  Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth , 2001 .

[11]  Wei Chen,et al.  Scalable influence maximization for prevalent viral marketing in large-scale social networks , 2010, KDD.

[12]  Nicola Barbieri,et al.  Influence-Based Network-Oblivious Community Detection , 2013, 2013 IEEE 13th International Conference on Data Mining.

[13]  Andrei Broder,et al.  Social Influence Based Clustering of Heterogeneous Information Networks , 2013 .

[14]  Christos Faloutsos,et al.  Using ghost edges for classification in sparsely labeled networks , 2008, KDD.

[15]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[16]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[18]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[19]  Alessandro Vespignani,et al.  Velocity and hierarchical spread of epidemic outbreaks in scale-free networks. , 2003, Physical review letters.

[20]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[21]  Ling Liu,et al.  Social influence based clustering of heterogeneous information networks , 2013, KDD.

[22]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[24]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[25]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[26]  M E J Newman,et al.  Identity and Search in Social Networks , 2002, Science.

[27]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[28]  Stanford,et al.  Learning to Discover Social Circles in Ego Networks , 2012 .

[29]  Martha White,et al.  Optimal reverse prediction: a unified perspective on supervised, unsupervised and semi-supervised learning , 2009, ICML '09.

[30]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[31]  Enhong Chen,et al.  On Approximation of Real-World Influence Spread , 2012, ECML/PKDD.

[32]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.