An Imputation-based Augmented Anomaly Detection from Large Traces of Operating System Events

Software debugging, audit, and compliance testing are some of the tasks we perform using execution traces of an operating system. However, these actions gather information about the behavior of the software vis-a-vis its design aims. In this work, our analysis of the execution traces of an embedded real-time operating system (RTOS) is rather to model the behavior of the physical system being managed by the software application via the embedded operating system. Hence, for an event-triggered embedded RTOS that controls the behavior of a bespoke system like an unmanned aerial vehicle (UAV), the events in the execution traces of the embedded RTOS is directly linked to the operation of the controlled physical system. Therefore, we hypothesize that the frequency of events (method/function calls) per observation is a useful feature for modeling the behavior of the physical system controlled by the operating system. Furthermore, we tackle the challenge of lack of data that sufficiently captures the possible degree of aberration that may occur in a system. We model augmentation via artificial missingness and imputation in the data we have to generate new cases. We implement missingness using the missing completely at random (MCAR) strategy, and we use the overall single mean imputation method at the imputation stage. This imputation method takes the average of the remaining values in the dataset and replaces missing values with this average. This accretion leads to an imputation-based augmented anomaly detection model that enables us to expand both the training and validation/test data. Expansion of the test data ensures that we reduce the misclassification resulting from the non-parametric nature of the anomalies that may occur on the physical system, while the use of injected data for training helps us to do a stress test on our model. We test our model with traces of a real-time operating system kernel of a UAV, and the results show that the model achieves an improved anomalous trace detection accuracy even under the induced missingness.

[1]  Comparison Levine,et al.  Quantitative Applications in the Social Sciences , 2006 .

[2]  Sebastian Fischmeister,et al.  Anomaly Detection Using Inter-Arrival Curves for Real-Time Systems , 2016, 2016 28th Euromicro Conference on Real-Time Systems (ECRTS).

[3]  Vyas Sekar,et al.  An empirical evaluation of entropy-based traffic anomaly detection , 2008, IMC '08.

[4]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[5]  Gabriel Maciá-Fernández,et al.  Anomaly-based network intrusion detection: Techniques, systems and challenges , 2009, Comput. Secur..

[6]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[7]  Sebastian Fischmeister,et al.  Dataset for Anomaly Detection Using Inter-Arrival Curves for Real-time Systems , 2016 .

[8]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[9]  Stamatis Karnouskos,et al.  Stuxnet worm impact on industrial cyber-physical system security , 2011, IECON 2011 - 37th Annual Conference of the IEEE Industrial Electronics Society.

[10]  R. Sokal,et al.  THE COMPARISON OF DENDROGRAMS BY OBJECTIVE METHODS , 1962 .

[11]  Bernhard Plattner,et al.  Entropy based worm and anomaly detection in fast IP networks , 2005, 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise (WETICE'05).

[12]  R. Little A Test of Missing Completely at Random for Multivariate Data with Missing Values , 1988 .

[13]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[14]  Donald F. Towsley,et al.  Detecting anomalies in network traffic using maximum entropy estimation , 2005, IMC '05.

[15]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[16]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[17]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[18]  G. Maciá-Fernández,et al.  Anomaly-based network intrusion detection: Techniques, systems and challenges , 2009, Comput. Secur..

[19]  Vipin Kumar,et al.  Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.