Learning Latent Factors for Community Identification and Summarization

Network communities, which are also known as network clusters, are typical latent structures in network data. Vertices in each of these communities tend to interact more and share similar features with each other. Community identification and feature summarization are significant tasks of network analytics. To perform either of the two tasks, there have been several approaches proposed, taking into the consideration of different categories of information carried by the network, e.g., edge structure, node attributes, or both aforementioned. But few of them are able to discover communities and summarize their features simultaneously. To address this challenge, we propose a novel latent factor model for community identification and summarization (LFCIS). To perform the task, the LFCIS first formulates an objective function that evaluating the overall clustering quality taking into the consideration of both edge topology and node features in the network. In the objective function, the LFCIS also adopts an effective component that ensures those vertices sharing with both similar local structures and features to be located into the same clusters. To identify the optimal cluster membership for each vertex, a convergent algorithm for updating the variables in the objective function is derived and used by LFCIS. The LFCIS has been tested with six sets of network data, including synthetic and real networks, and compared with several state-of-the-art approaches. The experimental results show that the LFCIS outperforms most of the prevalent approaches to community discovery in social networks, and the LFCIS is able to identify the latent features that may characterize those discovered communities.

[1]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[2]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Jure Leskovec,et al.  Empirical comparison of algorithms for network community detection , 2010, WWW '10.

[4]  Mason A. Porter,et al.  Comparing Community Structure to Characteristics in Online Collegiate Social Networks , 2008, SIAM Rev..

[5]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Keith C. C. Chan,et al.  MISAGA: An Algorithm for Mining Interesting Subgraphs in Attributed Graphs , 2018, IEEE Transactions on Cybernetics.

[8]  Jure Leskovec,et al.  Detecting cohesive and 2-mode communities indirected and undirected networks , 2014, WSDM.

[9]  Jure Leskovec,et al.  Discovering social circles in ego networks , 2012, ACM Trans. Knowl. Discov. Data.

[10]  David B. Dunson,et al.  Probabilistic topic models , 2012, Commun. ACM.

[11]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[12]  Yizhou Sun,et al.  iTopicModel: Information Network-Integrated Topic Modeling , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[13]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[16]  Thomas Seidl,et al.  Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs with Feature Vectors , 2013, PAKDD.

[17]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[18]  Hong Cheng,et al.  GBAGC: A General Bayesian Framework for Attributed Graph Clustering , 2014, TKDD.

[19]  Yun Liu,et al.  Overlapping Community Detection Using Non-Negative Matrix Factorization With Orthogonal and Sparseness Constraints , 2018, IEEE Access.

[20]  Keith C. C. Chan,et al.  Discovering Fuzzy Structural Patterns for Graph Analytics , 2018, IEEE Transactions on Fuzzy Systems.

[21]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[22]  Keith C. C. Chan,et al.  Fuzzy Clustering in a Complex Network Based on Content Relevance and Link Structures , 2016, IEEE Transactions on Fuzzy Systems.

[23]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[24]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[25]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[26]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[27]  Jianfeng Feng,et al.  On the Spectral Characterization and Scalable Mining of Network Communities , 2012, IEEE Transactions on Knowledge and Data Engineering.

[28]  Keith C. C. Chan,et al.  Evolutionary community detection in social networks , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[29]  Charu C. Aggarwal,et al.  Community Detection with Edge Content in Social Media Networks , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[30]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  Hong Cheng,et al.  Clustering Large Attributed Graphs: An Efficient Incremental Approach , 2010, 2010 IEEE International Conference on Data Mining.

[32]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[33]  Joachim M. Buhmann,et al.  Multi-assignment clustering for Boolean data , 2009, ICML '09.

[34]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[35]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[36]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[37]  Henry Leung,et al.  Deep Fusion of Multiple Networks for Learning Latent Social Communities , 2017, 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI).