GWAD: Greedy Workflow Graph Anomaly Detection Framework for System Traces

System traces are a collection of time-stamped messages recorded by the operating system while the system is running. Analysis of these traces is crucial for tasks such as system fault finding. Moreover, detecting anomalies in system behavior becomes crucial in safety-critical and time-sensitive systems where delayed detections can lead to catastrophic outcomes. Therefore, we focus on developing a lightweight and explainable approach for safety-critical time-sensitive systems.Given a set of system traces under normal conditions and anomalous conditions, trace-based anomaly detection aims at classifying the trace as anomalous or not. In this work, we introduce GWAD, a greedy workflow graph framework for anomaly detection, a novel greedy graph construction approach for both offline and online anomaly detection in system traces. Our approach utilizes both sequence of occurrence of events and the time interval between their occurrences in learning the normal system behavior. We propose two approaches, first for offline classification of the trace as anomalous or normal using the event occurrence workflow graphs and secondly an online streaming algorithm that monitors the events as they occur in real-time for detecting anomalies increasing system resilience. Our approach also provides reasoning for the cause of anomalous behavior. We show that GWAD is better than traditional state-of-the-art models. The paper shows the technical feasibility and viability of GWAD through multiple case studies using traces from a field-tested hexacopter.

[1]  Stephen Pauwels,et al.  Detecting anomalies in hybrid business process logs , 2019, SIAP.

[2]  Sebastian Fischmeister,et al.  Mining specifications using nested words , 2017, 2017 6th International Workshop on Software Mining (SoftwareMining).

[3]  Ke Zhang,et al.  Execution anomaly detection in large-scale systems through console log analysis , 2018, J. Syst. Softw..

[4]  Roman L. Lysecky,et al.  Time and Sequence Integrated Runtime Anomaly Detection for Embedded Systems , 2018, ACM Trans. Embed. Comput. Syst..

[5]  Jugal K. Kalita,et al.  Network Anomaly Detection: Methods, Systems and Tools , 2014, IEEE Communications Surveys & Tutorials.

[6]  Ed F. Deprettere,et al.  Exploring Embedded-Systems Architectures with Artemis , 2001, Computer.

[7]  Michel Dagenais,et al.  Recovering disk storage metrics from low‐level trace events , 2018, Softw. Pract. Exp..

[8]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[9]  Huaimin Wang,et al.  Localizing root causes of performance anomalies in cloud computing systems by analyzing request trace logs , 2012, Science China Information Sciences.

[10]  Sebastian Fischmeister,et al.  Mining Timed Regular Specifications from System Traces , 2018, ACM Trans. Embed. Comput. Syst..

[11]  Gregory Dudek,et al.  Topological Mapping through Distributed, Passive Sensors , 2007, IJCAI.

[12]  Sebastian Fischmeister,et al.  Mining Time for Timed Regular Specifications , 2019, 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC).

[13]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[14]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[15]  F. Y. Edgeworth XXII. On a new method of reducing observations relating to several quantities , 1888 .

[16]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[17]  Paris Avgeriou,et al.  A practice-driven systematic review of dependency analysis solutions , 2011, Empirical Software Engineering.

[18]  Carla E. Brodley,et al.  Temporal sequence learning and data reduction for anomaly detection , 1998, CCS '98.

[19]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[20]  Qingfeng Du,et al.  A Causality Mining and Knowledge Graph Based Method of Root Cause Diagnosis for Performance Anomaly in Cloud Applications , 2020, Applied Sciences.