Spectral clustering algorithms for the detection of clusters in block-cyclic and block-acyclic graphs

We propose two spectral algorithms for partitioning nodes in directed graphs respectively with a cyclic and an acyclic pattern of connection between groups of nodes. Our methods are based on the computation of extremal eigenvalues of the transition matrix associated to the directed graph. The two algorithms outperform state-of-the art methods for directed graph clustering on synthetic datasets, including methods based on blockmodels, bibliometric symmetrization and random walks. Our algorithms have the same space complexity as classical spectral clustering algorithms for undirected graphs and their time complexity is also linear in the number of edges in the graph. One of our methods is applied to a trophic network based on predator-prey relationships. It successfully extracts common categories of preys and predators encountered in food chains. The same method is also applied to highlight the hierarchical structure of a worldwide network of Autonomous Systems depicting business agreements between Internet Service Providers.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  T. De Mazancourt,et al.  The inverse of a block-circulant matrix , 1983 .

[3]  C. Sabine,et al.  Global Carbon Cycle , 2014 .

[4]  D. Sorensen Numerical methods for large eigenvalue problems , 2002, Acta Numerica.

[5]  F. Chung Laplacians and the Cheeger Inequality for Directed Graphs , 2005 .

[6]  Daniel Pauly,et al.  FISHING DOWN MARINE FOOD WEB: IT IS FAR MORE PERVASIVE THAN WE THOUGHT , 2005 .

[7]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Srinivasan Parthasarathy,et al.  Symmetrizations for clustering directed graphs , 2011, EDBT/ICDT '11.

[9]  M. Neumann,et al.  Group Inverses of M-Matrices and Their Applications , 2012 .

[10]  Charles C. Elton Animal Ecology , 1927, Nature.

[11]  Geoffrey Sanders,et al.  Detecting highly cyclic structure with complex eigenpairs , 2016, ArXiv.

[12]  Dale Schuurmans,et al.  Web Communities Identification from Random Walks , 2006, PKDD.

[13]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  H. Weyl Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung) , 1912 .

[15]  Max A. Viergever,et al.  Normalized mutual information based registration using k-means clustering and shading correction , 2006, Medical Image Anal..

[16]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[17]  George C. Homans Human Group , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[18]  Christof Schütte,et al.  Modularity of Directed Networks: Cycle Decomposition Approach , 2014, ArXiv.

[19]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[20]  Marcus Weber,et al.  Finding Dominant Structures of Nonreversible Markov Processes , 2016, Multiscale Model. Simul..

[21]  Gene H. Golub,et al.  Matrix computations , 1983 .

[22]  Marina Meila,et al.  Spectral Clustering of Biological Sequence Data , 2005, AAAI.

[23]  Tiago P. Peixoto,et al.  The graph-tool python library , 2014 .

[24]  Cheng-Kok Koh,et al.  From $O(k^{2}N)$ to $O(N)$ : A Fast and High-Capacity Eigenvalue Solver for Full-Wave Extraction of Very Large Scale On-Chip Interconnects , 2009, IEEE Transactions on Microwave Theory and Techniques.

[25]  Sergey Brin,et al.  Reprint of: The anatomy of a large-scale hypertextual web search engine , 2012, Comput. Networks.

[26]  Michalis Vazirgiannis,et al.  Clustering and Community Detection in Directed Networks: A Survey , 2013, ArXiv.

[27]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[28]  N. L. Johnson,et al.  Continuous Multivariate Distributions, Volume 1: Models and Applications , 2019 .

[29]  E. Seneta Non-negative Matrices and Markov Chains , 2008 .

[30]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[31]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[32]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[33]  Vasileios Giotsas,et al.  AS relationships, customer cones, and validation , 2013, Internet Measurement Conference.

[34]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[35]  R. Balakrishnan,et al.  A textbook of graph theory , 1999 .

[36]  Douglas R. White,et al.  Role models for complex networks , 2007, 0708.0958.

[37]  V. N. Bogaevski,et al.  Matrix Perturbation Theory , 1991 .

[38]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[39]  L. L. Cam,et al.  An approximation theorem for the Poisson binomial distribution. , 1960 .

[40]  Carey E. Priebe,et al.  A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs , 2011, 1108.2228.

[41]  W. Goodenough,et al.  Human Group , 1951, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[42]  Mauricio Barahona,et al.  Finding role communities in directed networks using Role-Based Similarity, Markov Stability and the Relaxed Minimum Spanning Tree , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[43]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[44]  Valerie Isham,et al.  Non‐Negative Matrices and Markov Chains , 1983 .

[45]  J. Ramasco,et al.  Inversion method for content-based networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  R. Penrose A Generalized inverse for matrices , 1955 .

[47]  Stephen D. Butz Science of Earth Systems , 2002 .