Correlating events with time series for incident diagnosis

As online services have more and more popular, incident diagnosis has emerged as a critical task in minimizing the service downtime and ensuring high quality of the services provided. For most online services, incident diagnosis is mainly conducted by analyzing a large amount of telemetry data collected from the services at runtime. Time series data and event sequence data are two major types of telemetry data. Techniques of correlation analysis are important tools that are widely used by engineers for data-driven incident diagnosis. Despite their importance, there has been little previous work addressing the correlation between two types of heterogeneous data for incident diagnosis: continuous time series data and temporal event data. In this paper, we propose an approach to evaluate the correlation between time series data and event data. Our approach is capable of discovering three important aspects of event-timeseries correlation in the context of incident diagnosis: existence of correlation, temporal order, and monotonic effect. Our experimental results on simulation data sets and two real data sets demonstrate the effectiveness of the algorithm.

[1]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[2]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[3]  A. Zhisheng,et al.  Correlation between climate events in the North Atlantic and China during the last glaciation , 1995, Nature.

[4]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .

[5]  Qiang Fu,et al.  Performance Issue Diagnosis for Online Service Systems , 2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[6]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[7]  Philip S. Yu,et al.  Detecting Leaders from Correlated Time Series , 2010, DASFAA.

[8]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[9]  Paramvir Bahl,et al.  Towards highly reliable enterprise network services via inference of multi-level dependencies , 2007, SIGCOMM.

[10]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[11]  A. Berger FUNDAMENTALS OF BIOSTATISTICS , 1969 .

[12]  Eamonn J. Keogh,et al.  DTW-D: time series semi-supervised learning from a single example , 2013, KDD.

[13]  Fabio Casati,et al.  Event correlation for process discovery from web service interaction logs , 2011, The VLDB Journal.

[14]  Qiang Fu,et al.  Mining dependency in distributed systems through unstructured logs analysis , 2010, OPSR.

[15]  Armando Fox,et al.  Capturing, indexing, clustering, and retrieving system history , 2005, SOSP '05.

[16]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[17]  Paramvir Bahl,et al.  Towards highly reliable enterprise network services via inference of multi-level dependencies , 2007, SIGCOMM '07.

[18]  Kenji Fukumizu,et al.  Hypothesis testing using pairwise distances and associated kernels , 2012, ICML.

[19]  M. Schilling Multivariate Two-Sample Tests Based on Nearest Neighbors , 1986 .

[20]  J. Robins,et al.  Signed directed acyclic graphs for causal inference , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[21]  Jeffrey S. Chase,et al.  Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control , 2004, OSDI.

[22]  Boris Gruschke,et al.  INTEGRATED EVENT MANAGEMENT: EVENT CORRELATION USING DEPENDENCY GRAPHS , 1998 .

[23]  Ranveer Chandra,et al.  What's going on?: learning communication rules in edge networks , 2008, SIGCOMM '08.

[24]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[25]  Paramvir Bahl,et al.  Detailed diagnosis in enterprise networks , 2009, SIGCOMM '09.

[26]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[27]  Qiang Fu,et al.  Mining program workflow from interleaved traces , 2010, KDD.

[28]  Jörg Rech,et al.  Knowledge Discovery in Databases , 2001, Künstliche Intell..

[29]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[30]  Qiang Fu,et al.  Software analytics for incident management of online services: An experience report , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).