Fuzzy Clustering in a Complex Network Based on Content Relevance and Link Structures

Many real-world problems can be represented as complex networks with nodes representing different objects and links between nodes representing relationships between objects. As different attributes can be considered as associating with different objects, other than nontrivial link structures, complex networks also contain rich content information, and it can be a big challenge to find interesting clusters in such networks by fully exploiting the knowledge of both content and link information in them. Although some attempts have been made to tackle this clustering problem, few of them have considered the feasibility of identifying clusters in complex networks using a fuzzy-based clustering approach. We believe that, if the degree of membership to a cluster that a node belongs to can be considered, we will be able to better identify clusters in complex networks, as we may be able to identify overlapping clusters. In this paper, we, therefore, propose a fuzzy-based clustering algorithm for this task. The algorithm, which we call Fuzzy Clustering Algorithm for Complex Networks (FCAN), can discover clusters by taking into consideration both link and content information. It does so by first processing the content information by introducing a measure to quantify the relevance of contents between each pair of nodes within the network. It then proceeds to leverage the link information in the clustering process by considering a measure of cluster density. Based on these measures, FCAN identifies fuzzy clusters that are more densely connected and more highly relevant in their contents to optimize the degrees of memberships of each node belonging to different clusters. The performance of FCAN has been evaluated with several synthetic and real datasets involving those of document classification and social community detection. The results show that, in terms of accuracy, computation efficiency, and scalability, FCAN can be a very promising approach.

[1]  Keith C. C. Chan,et al.  Utilizing Both Topological and Attribute Information for Protein Complex Identification in PPI Networks , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Deborah Estrin,et al.  Impact of network density on data aggregation in wireless sensor networks , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[3]  Sadaaki Miyamoto,et al.  Fuzzy Clustering by Quadratic Regularization , 1998 .

[4]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[5]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Thomas Seidl,et al.  DB-CSC: A Density-Based Approach for Subspace Clustering in Graphs with Feature Vectors , 2011, ECML/PKDD.

[7]  Hong Cheng,et al.  GBAGC: A General Bayesian Framework for Attributed Graph Clustering , 2014, TKDD.

[8]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[9]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[10]  Srinivasan Parthasarathy,et al.  Efficient community detection in large networks using content and links , 2012, WWW.

[11]  Witold Pedrycz,et al.  Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study , 2010, Fuzzy Sets Syst..

[12]  T. Nepusz,et al.  Fuzzy communities and the concept of bridgeness in complex networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Raghu Krishnapuram,et al.  Fuzzy co-clustering of documents and keywords , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[14]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[15]  Francesco Bonchi,et al.  Description-Driven Community Detection , 2014, TIST.

[16]  Dorothea Wagner,et al.  Dynamic Graph Clustering Using Minimum-Cut Trees , 2009, J. Graph Algorithms Appl..

[17]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[18]  Jian-Ping Mei,et al.  A Fuzzy Approach for Multitype Relational Data Clustering , 2012, IEEE Transactions on Fuzzy Systems.

[19]  Miin-Shen Yang A survey of fuzzy clustering , 1993 .

[20]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[21]  Witold Pedrycz,et al.  Advances in Fuzzy Clustering and its Applications , 2007 .

[22]  M. Welling,et al.  Infinite state Bayesian networks , 2007, NIPS 2007.

[23]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[24]  Faraz Zaidi,et al.  Communities and hierarchical structures in dynamic social networks: analysis and visualization , 2011, Social Network Analysis and Mining.

[25]  Yizhou Sun,et al.  iTopicModel: Information Network-Integrated Topic Modeling , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[26]  Adel Javanmard,et al.  Learning Linear Bayesian Networks with Latent Variables , 2012, ICML.

[27]  Colin Studholme,et al.  An overlap invariant entropy measure of 3D medical image alignment , 1999, Pattern Recognit..

[28]  A. Banerjee,et al.  Social Topic Models for Community Extraction , 2008 .

[29]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[30]  James M. Keller,et al.  Improvements to the relational fuzzy c-means clustering algorithm , 2014, Pattern Recognit..

[31]  Charu C. Aggarwal,et al.  An Introduction to Social Network Data Analytics , 2011, Social Network Data Analytics.

[32]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[33]  James C. Bezdek,et al.  Relational duals of the c-means clustering algorithms , 1989, Pattern Recognit..

[34]  S. vanDongen Graph Clustering by Flow Simulation , 2000 .

[35]  Yousef Saad,et al.  Dense Subgraph Extraction with Application to Community Detection , 2012, IEEE Transactions on Knowledge and Data Engineering.

[36]  Hong Cheng,et al.  Clustering Large Attributed Graphs: An Efficient Incremental Approach , 2010, 2010 IEEE International Conference on Data Mining.

[37]  Andrew K. C. Wong,et al.  Learning sequential patterns for probabilistic inductive prediction , 1994 .

[38]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[39]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[40]  Ulrik Brandes,et al.  On Modularity Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[41]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.