Multi-Level Event and Anomaly Correlation Based on Enterprise Architecture Information

Growing IT landscapes in and among enterprises face the challenge of increasing complexity, which complicates root cause analysis and calls for automated support. This paper presents an approach to correlate events, e.g. anomalies in multi-level monitoring stream data, for instance conversion rates or network load monitoring. Events, e.g. operational activities like application deployments and marketing activities can be taken into account, too. We exploit an Enterprise Architecture documented as a graph to focus on those correlations, where relationships are already known. Therefore, different data source types are identified. We present a minimal prototypical implementation called MLAC that shows first results of the feasibility of the approach, in particular to correlate events and level shift anomalies in an artificial web-shop setup. It includes a dynamic visualization of the correlations in the EA graph.

[1]  M. Otto,et al.  Outliers in Time Series , 1972 .

[2]  Paul Grünbacher,et al.  ReMinds : A flexible runtime monitoring framework for systems of systems , 2016, J. Syst. Softw..

[3]  B. Abraham,et al.  Outlier detection and time series modeling , 1989 .

[4]  Rizos Sakellariou,et al.  A taxonomy of grid monitoring systems , 2005, Future Gener. Comput. Syst..

[5]  Steve Harenberg,et al.  Anomaly detection in dynamic networks: a survey , 2015 .

[6]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[7]  Vanish Talwar,et al.  VScope: Middleware for Troubleshooting Time-Sensitive Data Center Applications , 2012, Middleware.

[8]  Sam Shah,et al.  Root cause detection in a service-oriented architecture , 2013, SIGMETRICS '13.

[9]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[10]  Irène Gijbels,et al.  Variable selection using P‐splines , 2015 .

[11]  Ana Bianco,et al.  Outlier Detection in Regression Models with ARIMA Errors Using Robust Estimates , 2001 .

[12]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[13]  Daniel Massey,et al.  Argus: End-to-end service anomaly detection and localization from an ISP's point of view , 2012, 2012 Proceedings IEEE INFOCOM.

[14]  Hongliang Fei,et al.  Anomaly localization for network data streams with graph joint sparse PCA , 2011, KDD.

[15]  Malgorzata Steinder,et al.  End-to-end service failure diagnosis using belief networks , 2002, NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327).

[16]  Dimitris Plexousakis,et al.  Towards Cross-Layer Monitoring of Multi-Cloud Service-Based Applications , 2013, ESOCC.

[17]  C.S. Chao,et al.  An Automated Fault Diagnosis System Using Hierarchical Reasoning and Alarm Correlation , 1999, Proceedings 1999 IEEE Workshop on Internet Applications (Cat. No.PR00197).

[18]  R. Tsay Outliers, Level Shifts, and Variance Changes in Time Series , 1988 .

[19]  Jiawei Han,et al.  Dustminer: troubleshooting interactive complexity bugs in sensor networks , 2008, SenSys '08.

[20]  Marco Montali,et al.  Compliance monitoring in business processes: Functionalities, application, and tool-support , 2015, Inf. Syst..

[21]  Ann Q. Gates,et al.  A taxonomy and catalog of runtime software-fault monitoring tools , 2004, IEEE Transactions on Software Engineering.

[22]  Luciano Baresi,et al.  Event-Based Multi-level Service Monitoring , 2013, 2013 IEEE 20th International Conference on Web Services.

[23]  Antoine Dutot,et al.  GraphStream: A Tool for bridging the gap between Complex Systems and Dynamic Graphs , 2008, ArXiv.

[24]  Vanish Talwar,et al.  A flexible architecture integrating monitoring and analytics for managing large-scale data centers , 2011, ICAC '11.

[25]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[26]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[27]  Antonio Pescapè,et al.  Cloud monitoring: A survey , 2013, Comput. Networks.

[28]  Michael I. Jordan,et al.  Failure diagnosis using decision trees , 2004 .

[29]  N. Ravishanker,et al.  Reallocation Outliers in Time Series , 1993 .

[30]  G. Box,et al.  Bayesian analysis of some outlier problems in time series , 1979 .

[31]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[32]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[33]  R. Tsay,et al.  Outlier Detection in Multivariate Time Series by Projection Pursuit , 2006 .

[34]  Ron Meir,et al.  Time Series Prediction using Mixtures of Experts , 1996, NIPS.

[35]  G.J. Minden,et al.  A survey of active network research , 1997, IEEE Communications Magazine.

[36]  Charles B. Silio,et al.  Systems of Systems approach for monitoring and response across net-centric enterprise systems , 2010, 2010 IEEE International Systems Conference.

[37]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[38]  Adrian Mos,et al.  Multi-level Monitoring and Analysis of Web-Scale Service Based Applications , 2009, ICSOC/ServiceWave Workshops.