Anomalous subgraph detection via Sparse Principal Component Analysis

Network datasets have become ubiquitous in many fields of study in recent years. In this paper we investigate a problem with applicability to a wide variety of domains — detecting small, anomalous subgraphs in a background graph. We characterize the anomaly in a subgraph via the well-known notion of network modularity, and we show that the optimization problem formulation resulting from our setup is very similar to a recently introduced technique in statistics called Sparse Principal Component Analysis (Sparse PCA), which is an extension of the classical PCA algorithm. The exact version of our problem formulation is a hard combinatorial optimization problem, so we consider a recently introduced semidefinite programming relaxation of the Sparse PCA problem. We show via results on simulated data that the technique is very promising.

[1]  Anand Srivastav,et al.  Finding Dense Subgraphs with Semidefinite Programming , 1998, APPROX.

[2]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[3]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, NIPS 2004.

[4]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[5]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[7]  Ronny Luss,et al.  DSPCA: a Toolbox for Sparse Principal Component Analysis , 2006 .

[8]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[9]  Alexandre d'Aspremont,et al.  Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[10]  Ying Xuan,et al.  Modularity-Maximizing Graph Communities via Mathematical Programming , 2009 .

[11]  Patrick J. Wolfe,et al.  Toward signal processing theory for graphs and non-Euclidean data , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Patrick J. Wolfe,et al.  Subgraph Detection Using Eigenvector L1 Norms , 2010, NIPS.

[13]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..