Sparsity-aware robust community detection (SPARCODE)

Community detection refers to finding densely connected groups of nodes in graphs. In important applications, such as cluster analysis and network modelling, the graph is sparse but outliers and heavy-tailed noise may obscure its structure. We propose a new method for Sparsity-aware Robust Community Detection (SPARCODE). Starting from a densely connected and outlier-corrupted graph, we first extract a preliminary sparsity improved graph model where we optimize the level of sparsity by mapping the coordinates from different clusters such that the distance of their embedding is maximal. Then, undesired edges are removed and the graph is constructed robustly by detecting the outliers using the connectivity of nodes in the improved graph model. Finally, fast spectral partitioning is performed on the resulting robust sparse graph model. The number of communities is estimated using modularity optimization on the partitioning results. We compare the performance to popular graph and cluster-based community detection approaches on a variety of benchmark network and cluster analysis data sets. Comprehensive experiments demonstrate that our method consistently finds the correct number of communities and outperforms existing methods in terms of detection performance, robustness and modularity score while requiring a reasonable computation time.

[1]  Qingshan Liu,et al.  Elastic Net Hypergraph Learning for Image Clustering and Semi-Supervised Classification , 2016, IEEE Transactions on Image Processing.

[2]  D. Kaplan Structural Equation Modeling: Foundations and Extensions , 2000 .

[3]  Argyris Kalogeratos,et al.  Dip-means: an incremental clustering method for estimating the number of clusters , 2012, NIPS.

[4]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[5]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[6]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Sergiy A. Vorobyov,et al.  Modelling Graph Errors: Towards Robust Graph Signal Processing , 2019, ArXiv.

[8]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Roberto Todeschini,et al.  Investigating the mechanisms of bioconcentration through QSAR classification trees. , 2016, Environment international.

[10]  Shuning Wang,et al.  Multiple Gaussian graphical estimation with jointly sparse penalty , 2016, Signal Process..

[11]  Chris H Wiggins,et al.  Bayesian approach to network modularity. , 2007, Physical review letters.

[12]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[14]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[16]  Ann-Kathrin Seifert,et al.  Toward Unobtrusive In-Home Gait Analysis Based on Radar Micro-Doppler Signatures , 2018, IEEE Transactions on Biomedical Engineering.

[17]  Carlo Ratti,et al.  A General Optimization Technique for High Quality Community Detection in Complex Networks , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Pablo M. Gleiser,et al.  Community Structure in Jazz , 2003, Adv. Complex Syst..

[19]  C. Jennison,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[20]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[21]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[22]  Michael Muma,et al.  Robust Statistics for Signal Processing , 2018 .

[23]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[24]  M. Fiedler A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory , 1975 .

[25]  Kousha Etessami,et al.  Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations , 2005, JACM.

[26]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[27]  Bálint Antal,et al.  An ensemble-based system for automatic screening of diabetic retinopathy , 2014, Knowl. Based Syst..

[28]  Carlos J. Perez,et al.  Addressing voice recording replications for Parkinson's disease detection , 2016, Expert Syst. Appl..

[29]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[30]  Michael Muma,et al.  An Unsupervised Approach for Graph-based Robust Clustering of Human Gait Signatures , 2020, 2020 IEEE Radar Conference (RadarConf20).

[31]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[32]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[33]  Peilin Yang,et al.  An overlapping community detection algorithm based on density peaks , 2017, Neurocomputing.

[34]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Romain Couillet,et al.  The random matrix regime of Maronna's M-estimator with elliptically distributed samples , 2013, J. Multivar. Anal..

[36]  Chris Hankin,et al.  Multi-scale Community Detection using Stability as Optimisation Criterion in a Greedy Algorithm , 2011, KDIR.

[37]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[38]  C. Croux,et al.  Robust High-Dimensional Precision Matrix Estimation , 2014, 1501.01219.

[39]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[40]  Patrick O. Perry,et al.  Bi-cross-validation of the SVD and the nonnegative matrix factorization , 2009, 0908.2062.

[41]  Ali Shojaie,et al.  The cluster graphical lasso for improved estimation of Gaussian graphical models , 2013, Comput. Stat. Data Anal..

[42]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[43]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[44]  Esa Ollila,et al.  Regularized $M$ -Estimators of Scatter Matrix , 2014, IEEE Transactions on Signal Processing.

[45]  Stephen Roberts,et al.  Overlapping community detection using Bayesian non-negative matrix factorization. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Elad Hazan,et al.  O(sqrt(log(n)) Approximation to SPARSEST CUT in Õ(n2) Time , 2004, SIAM J. Comput..

[47]  Pau Closas,et al.  M-Estimation-Based Subspace Learning for Brain Computer Interfaces , 2018, IEEE Journal of Selected Topics in Signal Processing.

[48]  Junyan Liu,et al.  Regularized robust estimation of mean and covariance matrix for incomplete data , 2019, Signal Process..

[49]  Michael Muma,et al.  Robust Estimation in Signal Processing: A Tutorial-Style Treatment of Fundamental Concepts , 2012, IEEE Signal Processing Magazine.

[50]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Ludvig Bohlin,et al.  Community detection and visualization of networks with the map equation framework , 2014 .

[52]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[53]  K. Nordhausen,et al.  Modern Nonparametric, Robust and Multivariate Methods , 2015 .

[54]  YanShuicheng,et al.  Learning with l1-graph for image analysis , 2010 .

[55]  Georgios B. Giannakis,et al.  Gene network inference via sparse structural equation modeling with genetic perturbations , 2011, 2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS).

[56]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[57]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[58]  Shang-Hua Teng,et al.  Spectral partitioning works: planar graphs and finite element meshes , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[59]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[60]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[61]  Trevor J. Hastie,et al.  The Graphical Lasso: New Insights and Alternatives , 2011, Electronic journal of statistics.

[62]  Michael Muma,et al.  Bayesian Target Enumeration and Labeling Using Radar Data of Human Gait , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[63]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[64]  François Fages,et al.  Modelling and querying interaction networks in the biochemical abstract machine BIOCHAM , 2002 .

[65]  D. Ayres-de- Campos,et al.  SisPorto 2.0: a program for automated analysis of cardiotocograms. , 2000, The Journal of maternal-fetal medicine.

[66]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[67]  Konstantin Avrachenkov,et al.  Cooperative Game Theory Approaches for Network Partitioning , 2017, COCOON.

[68]  Sanjeev Arora,et al.  O( p logn) Approximation to Sparsest Cut in O(n2) Time , 2004, FOCS 2004.

[69]  Frank Thomson Leighton,et al.  Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms , 1999, JACM.

[70]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[71]  Michael Muma,et al.  Bayesian Cluster Enumeration Criterion for Unsupervised Learning , 2017, IEEE Transactions on Signal Processing.

[72]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[73]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[74]  A. Dobson,et al.  Parasites dominate food web links. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[75]  Satish Rao,et al.  Geometry, flows, and graph-partitioning algorithms , 2008, Commun. ACM.

[76]  Shuicheng Yan,et al.  Learning With $\ell ^{1}$-Graph for Image Analysis , 2010, IEEE Transactions on Image Processing.

[77]  Satish Rao,et al.  Expander flows, geometric embeddings and graph partitioning , 2004, STOC '04.

[78]  Michael Muma,et al.  Robust Bayesian Cluster Enumeration , 2018, ArXiv.

[79]  Somwrita Sarkar,et al.  Community detection in graphs using singular value decomposition. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[80]  Lu Yang,et al.  Sparse representation and learning in visual recognition: Theory and applications , 2013, Signal Process..

[81]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[82]  Zhihui Lai,et al.  Structured optimal graph based sparse feature extraction for semi-supervised learning , 2020, Signal Process..

[83]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[84]  Kevin Baker,et al.  Classification of radar returns from the ionosphere using neural networks , 1989 .

[85]  D. Chklovskii,et al.  Wiring optimization can relate neuronal structure and function. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[86]  Sanjeev Arora,et al.  Expander flows, geometric embeddings and graph partitioning , 2009, JACM.

[87]  Michael Muma,et al.  Robust M-Estimation Based Bayesian Cluster Enumeration for Real Elliptically Symmetric Distributions , 2020 .

[88]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[89]  Philippe Forster,et al.  Performance Analysis of Covariance Matrix Estimates in Impulsive Noise , 2008, IEEE Transactions on Signal Processing.

[90]  Kemal Adem,et al.  DIVORCE PREDICTION USING CORRELATION BASED FEATURE SELECTION AND ARTIFICIAL NEURAL NETWORKS , 2019 .