Of the largest eigenvalue for modularity-based partitioning

Despite the popularity and broad range of spectral clustering algorithms, there is little work addressing the statistical significance of clustering results. Spectral clustering uses the eigenvalues of matrices, such as the Laplacian graph or the adjacency matrix minus a null model, to partition a network. Even though the distribution of the largest eigenvalue for these matrices is not known, random matrix theory provides analytical formulas for a family of matrices called Gaussian random ensembles. We demonstrate that the Tracy-Widom mapping of the largest eigenvalue of Gaussian random ensembles can be modified to predict the distribution of the largest eigenvalue of matrices used for modularity-based spectral clustering. Using this finding we derive formulas that control the type I error rate on modularity-based partitions.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  R. Leahy,et al.  Modularity-based graph partitioning using conditional expected models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  A. Edelman The Probability that a Random Real Gaussian Matrix haskReal Eigenvalues, Related Distributions, and the Circular Law , 1997 .

[4]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[5]  R. Guimerà,et al.  Modularity from fluctuations in random graphs and complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[8]  C. Tracy,et al.  The Distribution of the Largest Eigenvalue in the Gaussian Ensembles: β = 1, 2, 4 , 1997, solv-int/9707001.

[9]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[10]  M. Newman,et al.  Robustness of community structure in networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  P. Deift,et al.  Random Matrix Theory: Invariant Ensembles and Universality , 2009 .