In-Network PCA and Anomaly Detection

We consider the problem of network anomaly detection in large distributed systems. In this setting, Principal Component Analysis (PCA) has been proposed as a method for discovering anomalies by continuously tracking the projection of the data onto a residual subspace. This method was shown to work well empirically in highly aggregated networks, that is, those with a limited number of large nodes and at coarse time scales. This approach, however, has scalability limitations. To overcome these limitations, we develop a PCA-based anomaly detector in which adaptive local data filters send to a coordinator just enough data to enable accurate global detection. Our method is based on a stochastic matrix perturbation analysis that characterizes the tradeoff between the accuracy of anomaly detection and the amount of data communicated over the network.

[1]  D. R. Jensen,et al.  A Gaussian Approximation to the Distribution of a Definite Quadratic Form , 1972 .

[2]  J. E. Jackson,et al.  Control Procedures for Residuals Associated With Principal Component Analysis , 1979 .

[3]  S. Geman A Limit Theorem for the Norm of Random Matrices , 1980 .

[4]  R. Y. Rubinstein Generating random vectors uniformly distributed inside and on the surface of different regions , 1982 .

[5]  V. N. Bogaevski,et al.  Matrix Perturbation Theory , 1991 .

[6]  R. Holmes On random correlation matrices , 1991 .

[7]  Zlatko Drmac,et al.  On Principal Angles between Subspaces of Euclidean Space , 2000, SIAM J. Matrix Anal. Appl..

[8]  N. Alon,et al.  On the concentration of eigenvalues of random symmetric matrices , 2000, math-ph/0009032.

[9]  N. Samatova,et al.  Principal Component Analysis for Dimension Reduction in Massive Distributed Data Sets ∗ , 2002 .

[10]  Albrecht Böttcher,et al.  The Norm of the Product of a Large Matrix and a Random Vector , 2003 .

[11]  Konstantina Papagiannaki,et al.  A distributed approach to measure IP traffic matrices , 2004, IMC '04.

[12]  Michael K. Reiter,et al.  Seurat: A Pointillist Approach to Anomaly Detection , 2004, RAID.

[13]  Mark Crovella,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM '04.

[14]  George Varghese,et al.  On the difficulty of scalably detecting network attacks , 2004, CCS '04.

[15]  Konstantina Papagiannaki,et al.  Structural analysis of network traffic flows , 2004, SIGMETRICS '04/Performance '04.

[16]  Somesh Jha,et al.  Global Intrusion Detection in the DOMINO Overlay System , 2004, NDSS.

[17]  Anja Feldmann,et al.  Operational experiences with high-volume network intrusion detection , 2004, CCS '04.

[18]  Sriram Ramabhadran,et al.  NetProfiler: Profiling Wide-Area Networks Using Peer Cooperation , 2005, IPTPS.

[19]  Michael I. Jordan,et al.  Nonparametric decentralized detection using kernel methods , 2005, IEEE Transactions on Signal Processing.

[20]  Albert G. Greenberg,et al.  Network anomography , 2005, IMC '05.

[21]  Franklin T. Luk,et al.  Principal Component Analysis for Distributed Data Sets with Updating , 2005, APPT.

[22]  Alan S. Willsky,et al.  Inference with Minimal Communication: a Decision-Theoretic Variational Approach , 2005, NIPS.

[23]  Graham Cormode,et al.  Communication-efficient distributed monitoring of thresholded counts , 2006, SIGMOD Conference.

[24]  H. Vincent Poor,et al.  Distributed learning in wireless sensor networks , 2005, IEEE Signal Processing Magazine.

[25]  Thomas Hofmann,et al.  In-Network PCA and Anomaly Detection , 2007 .

[26]  A. Böttcher,et al.  Rigorous stochastic bounds for the error in large covariance matrices , 2008 .