A Dynamic Programming Framework for Large-Scale Online Clustering on Graphs

As a fundamental technique for data analysis, graph clustering grouping graph data into clusters has attracted great attentions in recent years. In this paper, we present DPOCG, a dynamic programming framework for large-scale online clustering on graphs, which improves the scalability of a wide range of graph clustering algorithms. Specifically, DPOCG first identifies the nodes whose states are unchanged compared with the states at the previous time on a large-scale graph, then constructs these unchanged nodes as supernodes, which greatly reduces the size of the graph at the current time, and collapses nodes whose degrees are less than a predefined threshold. Based on our density-based graph clustering algorithm (DGCM), DPOCG partitions the reduced graph into clusters. In addition, we theoretically analyze DPOCG in terms of supernode generation, clustering on reduced graph, and computational complexity. We evaluate DPOCG on a synthetic dataset and seven real-world datasets, respectively, and the experimental results show that DPOCG consumes less running time and improves the efficiency of clustering.

[1]  Pablo M. Gleiser,et al.  Community Structure in Jazz , 2003, Adv. Complex Syst..

[2]  Adil M. Bagirov,et al.  Optimization Based Clustering Algorithms for Authorship Analysis of Phishing Emails , 2017, Neural Processing Letters.

[3]  Shaomin Mu,et al.  A Fast Multiclass Classification Algorithm Based on Cooperative Clustering , 2013, Neural Processing Letters.

[4]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[5]  Konstantin Avrachenkov,et al.  Cooperative Game Theory Approaches for Network Partitioning , 2017, COCOON.

[6]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Feiping Nie,et al.  Clustering and projected clustering with adaptive neighbors , 2014, KDD.

[8]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jiawei Han,et al.  Large-Scale Spectral Clustering on Graphs , 2013, IJCAI.

[10]  Alex Pothen,et al.  Graph Partitioning Algorithms with Applications to Scientific Computing , 1997 .

[11]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[12]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[15]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[16]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[17]  Rong Wang,et al.  Fast and Orthogonal Locality Preserving Projections for Dimensionality Reduction , 2017, IEEE Transactions on Image Processing.

[18]  Rong Wang,et al.  Fast Spectral Clustering With Anchor Graph for Large Hyperspectral Images , 2017, IEEE Geoscience and Remote Sensing Letters.

[19]  A Díaz-Guilera,et al.  Self-similar community structure in a network of human interactions. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Xu Zhou,et al.  A density based link clustering algorithm for overlapping community detection in networks , 2017 .

[21]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[22]  Huan Xu,et al.  A Divide and Conquer Framework for Distributed Graph Clustering , 2015, ICML.

[23]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[24]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[25]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[26]  A. Arenas,et al.  Community detection in complex networks using extremal optimization. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Chen Li,et al.  Community detection in complex networks by density-based clustering , 2013 .

[28]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.