A Spectral Framework for Anomalous Subgraph Detection

A wide variety of application domains is concerned with data consisting of entities and their relationships or connections, formally represented as graphs. Within these diverse application areas, a common problem of interest is the detection of a subset of entities whose connectivity is anomalous with respect to the rest of the data. While the detection of such anomalous subgraphs has received a substantial amount of attention, no application-agnostic framework exists for analysis of signal detectability in graph-based data. In this paper, we describe a framework that enables such analysis using the principal eigenspace of a graph's residuals matrix, commonly called the modularity matrix in community detection. Leveraging this analytical tool, we show that the framework has a natural power metric in the spectral norm of the anomalous subgraph's adjacency matrix (signal power) and of the background graph's residuals matrix (noise power). We propose several algorithms based on spectral properties of the residuals matrix, with more computationally expensive techniques providing greater detection power. Detection and identification performance are presented for a number of signal and noise models, including clusters and bipartite foregrounds embedded into simple random backgrounds, as well as graphs with community structure and realistic degree distributions. The trends observed verify intuition gleaned from other signal processing areas, such as greater detection power when the signal is embedded within a less active portion of the background. We demonstrate the utility of the proposed techniques in detecting small, highly anomalous subgraphs in real graphs derived from Internet traffic and product co-purchases.

[1]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[2]  Benjamin A. Miller,et al.  Efficient anomaly detection in dynamic, attributed graphs: Emerging phenomena and big data , 2013, 2013 IEEE International Conference on Intelligence and Security Informatics.

[3]  Alfred O. Hero,et al.  Dynamic Stochastic Blockmodels for Time-Evolving Social Networks , 2014, IEEE Journal of Selected Topics in Signal Processing.

[4]  P. Wolfe,et al.  Anomalous subgraph detection via Sparse Principal Component Analysis , 2011, 2011 IEEE Statistical Signal Processing Workshop (SSP).

[5]  E A Leicht,et al.  Community structure in directed networks. , 2007, Physical review letters.

[6]  Raj Rao Nadakuditi,et al.  Graph spectra and the detectability of community structure in networks , 2012, Physical review letters.

[7]  Fan Chung Graham,et al.  The Spectra of Random Graphs with Given Expected Degrees , 2004, Internet Math..

[8]  Edward R. Scheinerman,et al.  Random Dot Product Graph Models for Social Networks , 2007, WAW.

[9]  Raj Rao Nadakuditi,et al.  On hard limits of eigen-analysis based planted clique detection , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[10]  Patrick J. Wolfe,et al.  Detection Theory for Graphs , 2013 .

[11]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[12]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[13]  D. Bu,et al.  Topological structure analysis of the protein-protein interaction network in budding yeast. , 2003, Nucleic acids research.

[14]  HolderLawrence,et al.  Anomaly detection in data represented as graphs , 2007 .

[15]  Jeremy Kepner,et al.  A scalable signal processing architecture for massive graph analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Steven Thomas Smith,et al.  Bayesian Discovery of Threat Networks , 2013, IEEE Transactions on Signal Processing.

[17]  Carey E. Priebe,et al.  Vertex Nomination via Content and Context , 2012, ArXiv.

[18]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[19]  Weixiong Zhang,et al.  An Efficient Spectral Algorithm for Network Community Discovery and Its Applications to Biological and Social Networks , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[20]  Milan Sonka,et al.  Ovarian ultrasound image analysis: follicle segmentation , 1998, IEEE Transactions on Medical Imaging.

[21]  Patrick J. Wolfe,et al.  Toward signal processing theory for graphs and non-Euclidean data , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Raj Rao Nadakuditi,et al.  Spectra of random graphs with arbitrary expected degrees , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Zhi-Hua Zhou,et al.  A spectral approach to detecting subtle anomalies in graphs , 2013, Journal of Intelligent Information Systems.

[24]  David B. Skillicorn,et al.  Detecting Anomalies in Graphs , 2007, 2007 IEEE Intelligence and Security Informatics.

[25]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[27]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[28]  Steven Thomas Smith,et al.  Harmonic space-time threat propagation for graph detection , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Benjamin A. Miller,et al.  Goodness-of-fit statistics for anomaly detection in Chung-Lu random graphs , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  F. Chung,et al.  Spectra of random graphs with given expected degrees , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Edoardo M. Airoldi,et al.  Stochastic blockmodels with growing number of classes , 2010, Biometrika.

[32]  Shirui Pan,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Graph Classification with Imbalanced Class Distributions and Noise ∗ , 2022 .

[33]  Lawrence B. Holder,et al.  Anomaly detection in data represented as graphs , 2007, Intell. Data Anal..

[34]  Christos Faloutsos,et al.  EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[35]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Hisashi Kashima,et al.  Eigenspace-based anomaly detection in computer systems , 2004, KDD.

[37]  E. Arias-Castro,et al.  Community Detection in Random Networks , 2013, 1302.7099.

[38]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[39]  E. Arias-Castro,et al.  Community Detection in Sparse Random Networks , 2013, 1308.2955.

[40]  Patrick J. Wolfe,et al.  Moments of parameter estimates for Chung-Lu random graph models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Dario Fasino,et al.  An Algebraic Analysis of the Graph Modularity , 2013, SIAM J. Matrix Anal. Appl..

[42]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[43]  Patrick J. Wolfe,et al.  Subgraph Detection Using Eigenvector L1 Norms , 2010, NIPS.

[44]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[45]  Eric D. Kolaczyk,et al.  A Compressed PCA Subspace Method for Anomaly Detection in High-Dimensional Data , 2011, IEEE Transactions on Information Theory.

[46]  Patrick J. Wolfe,et al.  Null models for network data , 2012, ArXiv.

[47]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[48]  Hanqing Lu,et al.  Unsupervised Change Detection in SAR Image using Graph Cuts , 2008, IGARSS 2008 - 2008 IEEE International Geoscience and Remote Sensing Symposium.

[49]  J. Skokan,et al.  A random graph model for terrorist transactions , 2004, 2004 IEEE Aerospace Conference Proceedings (IEEE Cat. No.04TH8720).

[50]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[51]  Benjamin A. Miller,et al.  Toward matched filter optimization for subgraph detection in dynamic networks , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[52]  Kenji Yamanishi,et al.  Network anomaly detection based on Eigen equation compression , 2009, KDD.

[53]  David J. Marchette,et al.  Scan Statistics on Enron Graphs , 2005, Comput. Math. Organ. Theory.

[54]  Padhraic Smyth,et al.  A Spectral Clustering Approach To Finding Communities in Graph , 2005, SDM.

[55]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[56]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[57]  B. A. Miller,et al.  Matched filtering for subgraph detection in dynamic networks , 2011, 2011 IEEE Statistical Signal Processing Workshop (SSP).

[58]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[59]  José M. F. Moura,et al.  Discrete Signal Processing on Graphs , 2012, IEEE Transactions on Signal Processing.

[60]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[61]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.