XStream : Explaining Anomalies in Event Stream Monitoring

In this paper, we present the XStream system that provides high-quality explanations for anomalous behaviors that users annotate on CEP-based monitoring results. Given the new requirements for explanations, namely, conciseness, consistency with human interpretation, and prediction power, most existing techniques cannot produce explanations that satisfy all three of them. The key technical contributions of this work include a formal definition of optimally explaining anomalies in CEP monitoring, and three key techniques for generating su cient feature space, characterizing the contribution of each feature to the explanation, and selecting a small subset of features as the optimal explanation, respectively. Evaluation using two real-world use cases shows that XStream can outperform existing techniques significantly in conciseness and consistency while achieving comparable high prediction power and retaining a highly e cient implementation of a data stream system.

[1]  Cyrus Shahabi,et al.  Distance-based Outlier Detection in Data Streams , 2016, Proc. VLDB Endow..

[2]  Neil Immerman,et al.  Efficient pattern matching over event streams , 2008, SIGMOD Conference.

[3]  Jonathan Goldstein,et al.  Consistent Streaming Through Time: A Vision for Event Stream Processing , 2006, CIDR.

[4]  Charu C. Aggarwal,et al.  Data Mining: The Textbook , 2015 .

[5]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data , 2014, Outlier Detection for Temporal Data.

[6]  Attila Gilányi,et al.  An Introduction to the Theory of Functional Equations and Inequalities , 2008 .

[7]  John Liagouris,et al.  Explaining Outputs in Modern Data Analytics , 2016, Proc. VLDB Endow..

[8]  Lei Cao,et al.  Sharing-Aware Outlier Analytics over High-Volume Data Streams , 2016, SIGMOD Conference.

[9]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[10]  Samuel Madden,et al.  Scorpion: Explaining Away Outliers in Aggregate Queries , 2013, Proc. VLDB Endow..

[11]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[12]  Elke A. Rundensteiner,et al.  Active Complex Event Processing over Event Streams , 2011, Proc. VLDB Endow..

[13]  Ching Y. Suen,et al.  Application of majority voting to pattern recognition: an analysis of its behavior and performance , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[14]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[15]  Dan Suciu,et al.  Explaining Query Answers with Explanation-Ready Databases , 2015, Proc. VLDB Endow..

[16]  Hao Huang,et al.  Streaming Anomaly Detection Using Randomized Matrix Sketching , 2015, Proc. VLDB Endow..

[17]  Johannes Gehrke,et al.  Cayuga: A General Purpose Event Monitoring System , 2007, CIDR.

[18]  D. Luckham Event Processing for Business: Organizing the Real-Time Enterprise , 2011 .

[19]  Jianzhong Li,et al.  Set-based Similarity Search for Time Series , 2016, SIGMOD Conference.

[20]  Samuel Madden,et al.  ZStream: a cost-based query processor for adaptively detecting composite events , 2009, SIGMOD Conference.

[21]  Divesh Srivastava,et al.  Fusing data with correlations , 2014, SIGMOD Conference.

[22]  U. Feige,et al.  Maximizing Non-monotone Submodular Functions , 2011 .

[23]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[24]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[25]  Parag Agrawal,et al.  Interpretable and Informative Explanations of Outcomes , 2014, Proc. VLDB Endow..