PCA-based network-wide correlated anomaly event detection and diagnosis

High-performance computing environments supporting large-scale distributed computing applications need multi-domain network performance measurements from open frameworks such as perfSONAR. Network-wide correlated anomaly events that can potentially impact data throughput performance need to be quickly and accurately notified for smooth computing environment operations. Since network topology is not always available along with the measurements data, it is challenging to identify and locate network-wide correlated anomaly events that impact data throughput performance. In this paper, we present a novel PCA-based correlated anomaly event detection scheme that can fuse multiple time-series of measurements and transform them using principal component analysis. We demonstrate using actual perfSONAR one-way delay measurement datasets that our scheme can: (a) effectively distinguish between correlated and uncorrelated anomalies, (b) leverage a source-side vantage point to diagnose whether a correlated anomaly event location is local or in an external domain, (c) act as a “black-box” correlation analysis tool for key insights in eventual root-cause identification.

[1]  A. Hanemann,et al.  Complementary Visualization of perfSONAR Network Performance Measurements , 2006, International Conference on Internet Surveillance and Protection (ICISP’06).

[2]  Dan Yang,et al.  Detecting Distributed Network Traffic Anomaly with Network-Wide Correlation Analysis , 2009, EURASIP J. Adv. Signal Process..

[3]  Yin Zhang,et al.  Troubleshooting chronic conditions in large IP networks , 2008, CoNEXT '08.

[4]  J. E. Jackson,et al.  Control Procedures for Residuals Associated With Principal Component Analysis , 1979 .

[5]  D. Martin Swany,et al.  PerfSONAR: A Service Oriented Architecture for Multi-domain Network Monitoring , 2005, ICSOC.

[6]  Mark Crovella,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM '04.

[7]  Yingjie Zhou,et al.  Network-Wide Anomaly Detection Based on Router Connection Relationships , 2011, IEICE Trans. Commun..

[8]  D. Martin Swany,et al.  A scalable framework for representation and exchange of network measurements , 2006, 2nd International Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities, 2006. TRIDENTCOM 2006..

[9]  Å Blockin AUTOMATED EVENT DETECTION FOR ACTIVE MEASUREMENT SYSTEMSevent dete , 2001 .

[10]  Jiri Navratil,et al.  Experiences in traceroute and available bandwidth change analysis , 2004, NetT '04.

[11]  Prasad Calyam,et al.  Topology-Aware Correlated Network Anomaly Event Detection and Diagnosis , 2013, Journal of Network and Systems Management.

[12]  Kavé Salamatian,et al.  Combining filtering and statistical methods for anomaly detection , 2005, IMC '05.

[13]  Prasad Calyam,et al.  OnTimeDetect: Dynamic Network Anomaly Notification in perfSONAR Deployments , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[14]  Arnak V. Poghosyan,et al.  An Enterprise Dynamic Thresholding System , 2014, ICAC.

[15]  Partha Kanuparthy,et al.  Pythia: detection, localization, and diagnosis of performance problems , 2013, IEEE Communications Magazine.