Multi-type Relational Data Clustering for Community Detection by Exploiting Content and Structure Information in Social Networks

Social Networks popularity has facilitated the providers with an opportunity to target specific user groups for various applications such as viral marketing and customized programs. However, the volume and variety of data present in a network challenge the identification of user communities effectively. The sparseness and heterogeneity in a network make it difficult to group the users with similar interests whereas the high dimensionality and sparseness in text pose difficulty in finding content focused groups. We present this problem of discovering user communities with common interests as the multi-type relational data (MTRD) learning with the content and structural information, and propose a novel solution based on non-negative matrix factorization with added regularization. We empirically evaluate the effectiveness of the proposed method on real-world Twitter datasets benchmarking with the state-of-the-art community discovery and clustering methods.

[1]  Katarzyna Musial,et al.  Adaptive community detection incorporating topology and content in social networks✰ , 2018, Knowl. Based Syst..

[2]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[3]  Wallapak Tavanapong,et al.  Identifying Policy Agenda Sub-Topics in Political Tweets based on Community Detection , 2017, 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Quanquan Gu,et al.  Co-clustering on manifolds , 2009, KDD.

[6]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[7]  Chris H. Q. Ding,et al.  Symmetric Nonnegative Matrix Factorization for Graph Clustering , 2012, SDM.

[8]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[9]  Richi Nayak,et al.  Clustering Multi-View Data Using Non-negative Matrix Factorization and Manifold Learning for Effective Understanding: A Survey Paper , 2019 .

[10]  Tat-Seng Chua,et al.  Leveraging Behavioral Factorization and Prior Knowledge for Community Discovery and Profiling , 2017, WSDM.

[11]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[12]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[13]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[14]  Katia P. Sycara,et al.  Nonnegative Matrix Tri-Factorization with Graph Regularization for Community Detection in Social Networks , 2015, IJCAI.

[15]  Srinivasan Parthasarathy,et al.  Efficient community detection in large networks using content and links , 2012, WWW.

[16]  Jian Yu,et al.  High-Order Co-clustering Text Data on Semantics-Based Representation Model , 2011, PAKDD.

[17]  Huan Liu,et al.  Text Analytics in Social Media , 2012, Mining Text Data.

[18]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[19]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[20]  Chun Chen,et al.  Relational co-clustering via manifold ensemble learning , 2012, CIKM.

[21]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Chris H. Q. Ding,et al.  Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization , 2011, CIKM '11.

[23]  Richi Nayak,et al.  Corpus-Based Augmented Media Posts with Density-Based Clustering for Community Detection , 2018, 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI).

[24]  Jukka-Pekka Onnela,et al.  Community Structure in Time-Dependent, Multiscale, and Multiplex Networks , 2009, Science.

[25]  Mike Conway,et al.  Examining thematic similarity, difference, and membership in three online mental health communities from reddit: A text mining and visualization approach , 2018, Comput. Hum. Behav..

[26]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[27]  Huan Liu,et al.  Community detection via heterogeneous interaction analysis , 2012, Data Mining and Knowledge Discovery.

[28]  L. Venkata Subramaniam,et al.  Using content and interactions for discovering communities in social networks , 2012, WWW.

[29]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[30]  Richi Nayak,et al.  Learning Association Relationship and Accurate Geometric Structures for Multi-Type Relational Data , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).