anomalyDetection: Implementation of Augmented Network Log Anomaly Detection Procedures

As the number of cyber-attacks continues to grow on a daily basis, so does the delay in threat detection. For instance, in 2015, the Office of Personnel Management discovered that approximately 21.5 million individual records of Federal employees and contractors had been stolen. On average, the time between an attack and its discovery is more than 200 days. In the case of the OPM breach, the attack had been going on for almost a year. Currently, cyber analysts inspect numerous potential incidents on a daily basis, but have neither the time nor the resources available to perform such a task. anomalyDetection aims to curtail the time frame in which anomalous cyber activities go unnoticed and to aid in the efficient discovery of these anomalous transactions among the millions of daily logged events by i) providing an efficient means for pre-processing and aggregating cyber data for analysis by employing a tabular vector transformation and handling multicollinearity concerns; ii) offering numerous built-in multivariate statistical functions such as Mahalanobis distance, factor analysis, principal components analysis to identify anomalous activity, iii) incorporating the pipe operator (%>%) to allow it to work well in the tidyverse workflow. Combined, anomalyDetection offers cyber analysts an efficient and simplified approach to break up network events into time-segment blocks and identify periods associated with suspected anomalies for further evaluation.

[1]  Duane T. Wegener,et al.  Evaluating the use of exploratory factor analysis in psychological research. , 1999 .

[2]  Trevor J. Bihl,et al.  Cyber anomaly detection: Using tabulated vectors and embedded analytics for efficient data mining , 2018, Journal of Algorithms & Computational Technology.

[3]  H. Kaiser An index of factorial simplicity , 1974 .

[4]  D. Jayathilake,et al.  Towards structured log analysis , 2012, 2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE).

[5]  Hee Sun Park,et al.  The Use of Exploratory Factor Analysis and Principal Components Analysis in Communication Research , 2002 .

[6]  Wayne G. Lutters,et al.  Developing expertise for network intrusion detection , 2009, Inf. Technol. People.

[7]  J. Horn A rationale and test for the number of factors in factor analysis , 1965, Psychometrika.

[8]  Jakub Breier,et al.  Anomaly Detection from Log Files Using Data Mining Techniques , 2015 .

[9]  G. Maciá-Fernández,et al.  Anomaly-based network intrusion detection: Techniques, systems and challenges , 2009, Comput. Secur..

[10]  T. L. Reguianski The Air Force Institute of Technology , 1962 .

[11]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[12]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[13]  Robert F. Mills,et al.  Design and Analysis of a Dynamically Configured Log-based Distributed Security Event Detection Methodology , 2012 .

[14]  Gabriel Maciá-Fernández,et al.  Anomaly-based network intrusion detection: Techniques, systems and challenges , 2009, Comput. Secur..

[15]  Michael J. Chapple,et al.  System Anomaly Detection: Mining Firewall Logs , 2006, 2006 Securecomm and Workshops.

[16]  Mahdi Zamani,et al.  Machine Learning Techniques for Intrusion Detection , 2013, ArXiv.