论文信息 - In-Network PCA and Anomaly Detection

In-Network PCA and Anomaly Detection

We consider the problem of network anomaly detection in large distributed systems. In this setting, Principal Component Analysis (PCA) has been proposed as a method for discovering anomalies by continuously tracking the projection of the data onto a residual subspace. This method was shown to work well empirically in highly aggregated networks, that is, those with a limited number of large nodes and at coarse time scales. This approach, however, has scalability limitations. To overcome these limitations, we develop a PCA-based anomaly detector in which adaptive local data filters send to a coordinator just enough data to enable accurate global detection. Our method is based on a stochastic matrix perturbation analysis that characterizes the tradeoff between the accuracy of anomaly detection and the amount of data communicated over the network.

[1] D. R. Jensen,et al. A Gaussian Approximation to the Distribution of a Definite Quadratic Form , 1972 .

[2] J. E. Jackson,et al. Control Procedures for Residuals Associated With Principal Component Analysis , 1979 .

[3] S. Geman. A Limit Theorem for the Norm of Random Matrices , 1980 .

[4] R. Y. Rubinstein. Generating random vectors uniformly distributed inside and on the surface of different regions , 1982 .

[5] V. N. Bogaevski,et al. Matrix Perturbation Theory , 1991 .

[6] R. Holmes. On random correlation matrices , 1991 .

[7] Zlatko Drmac,et al. On Principal Angles between Subspaces of Euclidean Space , 2000, SIAM J. Matrix Anal. Appl..

[8] N. Alon,et al. On the concentration of eigenvalues of random symmetric matrices , 2000, math-ph/0009032.

[9] N. Samatova,et al. Principal Component Analysis for Dimension Reduction in Massive Distributed Data Sets ∗ , 2002 .

[10] Albrecht Böttcher,et al. The Norm of the Product of a Large Matrix and a Random Vector , 2003 .

[11] Konstantina Papagiannaki,et al. A distributed approach to measure IP traffic matrices , 2004, IMC '04.

[12] Michael K. Reiter,et al. Seurat: A Pointillist Approach to Anomaly Detection , 2004, RAID.

[13] Mark Crovella,et al. Diagnosing network-wide traffic anomalies , 2004, SIGCOMM '04.

[14] George Varghese,et al. On the difficulty of scalably detecting network attacks , 2004, CCS '04.

[15] Konstantina Papagiannaki,et al. Structural analysis of network traffic flows , 2004, SIGMETRICS '04/Performance '04.

[16] Somesh Jha,et al. Global Intrusion Detection in the DOMINO Overlay System , 2004, NDSS.

[17] Anja Feldmann,et al. Operational experiences with high-volume network intrusion detection , 2004, CCS '04.

[18] Sriram Ramabhadran,et al. NetProfiler: Profiling Wide-Area Networks Using Peer Cooperation , 2005, IPTPS.

[19] Michael I. Jordan,et al. Nonparametric decentralized detection using kernel methods , 2005, IEEE Transactions on Signal Processing.

[20] Albert G. Greenberg,et al. Network anomography , 2005, IMC '05.

[21] Franklin T. Luk,et al. Principal Component Analysis for Distributed Data Sets with Updating , 2005, APPT.

[22] Alan S. Willsky,et al. Inference with Minimal Communication: a Decision-Theoretic Variational Approach , 2005, NIPS.

[23] Graham Cormode,et al. Communication-efficient distributed monitoring of thresholded counts , 2006, SIGMOD Conference.

[24] H. Vincent Poor,et al. Distributed learning in wireless sensor networks , 2005, IEEE Signal Processing Magazine.

[25] Thomas Hofmann,et al. In-Network PCA and Anomaly Detection , 2007 .

[26] A. Böttcher,et al. Rigorous stochastic bounds for the error in large covariance matrices , 2008 .