In-Network PCA and Anomaly Detection Ling 3

We consider the problem of network anomaly detection in large distributed systems. In this setting, Principal Component Analysis (PCA) has been proposed as a method for discovering anomalies by continuously tracking the projection of the data onto a residual subspace. This method was shown to work well empirically in highly aggregated networks, that is, those with a limited number of large nodes and at coarse time scales. This approach, however, has scalability limitations. To overcome these limitations, we develop a PCA-based anomaly detector in which adaptive local data filters send to a coordinator just enough data to enable accurate global detection. Our method is based on a stochastic matrix perturbation analysis that characterizes the tradeoff between the accuracy of anomaly detection and the amount of data communicated over the network.

[1]  Graham Cormode,et al.  Communication-efficient distributed monitoring of thresholded counts , 2006, SIGMOD Conference.

[2]  H. Vincent Poor,et al.  Distributed learning in wireless sensor networks , 2005, IEEE Signal Processing Magazine.

[3]  Alan S. Willsky,et al.  Inference with Minimal Communication: a Decision-Theoretic Variational Approach , 2005, NIPS.

[4]  Michael I. Jordan,et al.  Nonparametric decentralized detection using kernel methods , 2005, IEEE Transactions on Signal Processing.

[5]  Franklin T. Luk,et al.  Principal Component Analysis for Distributed Data Sets with Updating , 2005, APPT.

[6]  Albert G. Greenberg,et al.  Network anomography , 2005, IMC '05.

[7]  Sriram Ramabhadran,et al.  NetProfiler: Profiling Wide-Area Networks Using Peer Cooperation , 2005, IPTPS.

[8]  George Varghese,et al.  On the difficulty of scalably detecting network attacks , 2004, CCS '04.

[9]  Mark Crovella,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM '04.

[10]  Konstantina Papagiannaki,et al.  Structural analysis of network traffic flows , 2004, SIGMETRICS '04/Performance '04.

[11]  Somesh Jha,et al.  Global Intrusion Detection in the DOMINO Overlay System , 2004, NDSS.

[12]  Albrecht Böttcher,et al.  The Norm of the Product of a Large Matrix and a Random Vector , 2003 .

[13]  N. Samatova,et al.  Principal Component Analysis for Dimension Reduction in Massive Distributed Data Sets ∗ , 2002 .

[14]  N. Alon,et al.  On the concentration of eigenvalues of random symmetric matrices , 2000, math-ph/0009032.

[15]  Zlatko Drmac,et al.  On Principal Angles between Subspaces of Euclidean Space , 2000, SIAM J. Matrix Anal. Appl..

[16]  Jovan Dj. Golic Random correlation matrices , 1998, Australas. J Comb..

[17]  V. N. Bogaevski,et al.  Matrix Perturbation Theory , 1991 .

[18]  R. Y. Rubinstein Generating random vectors uniformly distributed inside and on the surface of different regions , 1982 .

[19]  S. Geman A Limit Theorem for the Norm of Random Matrices , 1980 .

[20]  J. E. Jackson,et al.  Control Procedures for Residuals Associated With Principal Component Analysis , 1979 .

[21]  D. R. Jensen,et al.  A Gaussian Approximation to the Distribution of a Definite Quadratic Form , 1972 .