AMOS: An automated model order selection algorithm for spectral graph clustering

One of the longstanding problems in spectral graph clustering (SGC) is the so-called model order selection problem: automated selection of the correct number of clusters. This is equivalent to the problem of finding the number of connected components or communities in an undirected graph. In this paper, we propose AMOS, an automated model order selection algorithm for SGC. Based on a recent analysis of clustering reliability for SGC under the random interconnection model, AMOS works by incrementally increasing the number of clusters, estimating the quality of identified clusters, and providing a series of clustering reliability tests. Consequently, AMOS outputs clusters of minimal model order with statistical clustering reliability guarantees. Comparing to three other automated graph clustering methods on real-world datasets, AMOS shows superior performance in terms of multiple external and internal clustering metrics.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[3]  Yi-Ping Chang,et al.  GENERALIZED CONFIDENCE INTERVALS FOR THE LARGEST VALUE OF SOME FUNCTIONS OF PARAMETERS UNDER NORMALITY , 2000 .

[4]  R. Merris Laplacian matrices of graphs: a survey , 1994 .

[5]  Florent Krzakala,et al.  Spectral detection in the censored block model , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[6]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[7]  Alfred O. Hero,et al.  Multi-Layer Graph Analysis for Dynamic Social Networks , 2013, IEEE Journal of Selected Topics in Signal Processing.

[8]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[9]  Patrick J. Wolfe,et al.  Detection Theory for Graphs , 2013 .

[10]  Matthew Roughan,et al.  The Internet Topology Zoo , 2011, IEEE Journal on Selected Areas in Communications.

[11]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[12]  Marc Moonen,et al.  Seeing the Bigger Picture: How Nodes Can Learn Their Place Within a Complex Ad Hoc Network Topology , 2013, IEEE Signal Processing Magazine.

[13]  Mohammad Shahidehpour,et al.  The IEEE Reliability Test System-1996. A report prepared by the Reliability Test System Task Force of the Application of Probability Methods Subcommittee , 1999 .

[14]  Elchanan Mossel,et al.  Spectral redemption in clustering sparse networks , 2013, Proceedings of the National Academy of Sciences.

[15]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[16]  Pascal Frossard,et al.  Clustering With Multi-Layer Graphs: A Spectral Perspective , 2011, IEEE Transactions on Signal Processing.

[17]  Alfred O. Hero,et al.  Dynamic Stochastic Blockmodels for Time-Evolving Social Networks , 2014, IEEE Journal of Selected Topics in Signal Processing.

[18]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Pengfei Liu,et al.  Local-Set-Based Graph Signal Reconstruction , 2014, IEEE Transactions on Signal Processing.

[20]  Stanford,et al.  Learning to Discover Social Circles in Ego Networks , 2012 .

[21]  Alfred O. Hero,et al.  Phase Transitions in Spectral Community Detection , 2014, IEEE Transactions on Signal Processing.

[22]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[23]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[24]  R F Potthoff,et al.  Testing for homogeneity. I. The binomial and multinomial distributions. , 1966, Biometrika.

[25]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[26]  José M. F. Moura,et al.  Big Data Analysis with Signal Processing on Graphs: Representation and processing of massive data sets with irregular structure , 2014, IEEE Signal Processing Magazine.

[27]  José M. F. Moura,et al.  Signal Recovery on Graphs: Variation Minimization , 2014, IEEE Transactions on Signal Processing.

[28]  Alfred O. Hero,et al.  Phase Transitions and a Model Order Selection Criterion for Spectral Graph Clustering , 2016, IEEE Transactions on Signal Processing.

[29]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[30]  Mohammed J. Zaki Data Mining and Analysis: Fundamental Concepts and Algorithms , 2014 .

[31]  Pietro Perona,et al.  Grouping and dimensionality reduction by locally linear embedding , 2001, NIPS.

[32]  Alfred O. Hero,et al.  Assessing and safeguarding network resilience to nodal attacks , 2014, IEEE Communications Magazine.

[33]  F. J. Anscombe,et al.  THE TRANSFORMATION OF POISSON, BINOMIAL AND NEGATIVE-BINOMIAL DATA , 1948 .