Recompose Event Sequences vs. Predict Next Events: A Novel Anomaly Detection Approach for Discrete Event Logs

One of the most challenging problems in the field of intrusion detection is anomaly detection for discrete event logs. While most earlier work focused on applying unsupervised learning upon engineered features, most recent work has started to resolve this challenge by applying deep learning methodology to abstraction of discrete event entries. Inspired by natural language processing, LSTM-based anomaly detection models were proposed. They try to predict upcoming events, and raise an anomaly alert when a prediction fails to meet a certain criterion. However, such a predict-next-event methodology has a fundamental limitation: event predictions may not be able to fully exploit the distinctive characteristics of sequences. This limitation leads to high false positives (FPs) and high false negatives (FNs). It is also critical to examine the structure of sequences and the bi-directional causality among individual events. To this end, we propose a new methodology: Recomposing event sequences as anomaly detection. We propose DabLog, a LSTM-based Deep Autoencoder-Based anomaly detection method for discrete event Logs. The fundamental difference is that, rather than predicting upcoming events, our approach determines whether a sequence is normal or abnormal by analyzing (encoding) and reconstructing (decoding) the given sequence. Our evaluation results show that our new methodology can significantly reduce the numbers of FPs and FNs, hence achieving a higher F1 score.

[1]  Qusay H. Mahmoud,et al.  DReAM: Deep Recursive Attentive Model for Anomaly Detection in Kernel Events , 2019, IEEE Access.

[2]  Jun Zhang,et al.  Insider Threat Identification Using the Simultaneous Neural Learning of Multi-Source Logs , 2019, IEEE Access.

[3]  Derek Lin,et al.  Anomalous User Activity Detection in Enterprise Multi-source Logs , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[4]  Benjamin Schrauwen,et al.  Training and Analysing Deep Recurrent Neural Networks , 2013, NIPS.

[5]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[6]  Yuval Elovici,et al.  Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection , 2018, NDSS.

[7]  Jun Zhang,et al.  Anomaly-Based Insider Threat Detection Using Deep Autoencoders , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[8]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Mahmood Yousefi-Azar,et al.  Autoencoder-based feature learning for cyber security applications , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[10]  Feifei Li,et al.  Spell: Streaming Parsing of System Event Logs , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[11]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Andrea Bondavalli,et al.  Quantitative comparison of unsupervised anomaly detection algorithms for intrusion detection , 2019, SAC.

[13]  Randy C. Paffenroth,et al.  Anomaly Detection with Robust Deep Autoencoders , 2017, KDD.

[14]  Dawn Xiaodong Song,et al.  Lifelong Anomaly Detection Through Unlearning , 2019, CCS.

[15]  Tayeb Kenaza,et al.  An efficient hybrid SVDD/clustering approach for anomaly-based intrusion detection , 2018, SAC.

[16]  Sanjay Chawla,et al.  Robust, Deep and Inductive Anomaly Detection , 2017, ECML/PKDD.

[17]  Olivier Y. de Vel,et al.  Unsupervised Insider Detection Through Neural Feature Learning and Model Optimisation , 2019, NSS.

[18]  Vipin Kumar,et al.  Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.

[19]  Xiaoqiang Lu,et al.  Exploiting Embedding Manifold of Autoencoders for Hyperspectral Anomaly Detection , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[20]  Jacob Eisenstein,et al.  Mimicking Word Embeddings using Subword RNNs , 2017, EMNLP.

[21]  Lalu Banoth,et al.  A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection , 2017 .

[22]  Zhou Li,et al.  Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data , 2014, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[23]  Raghavendra Chalapathy University of Sydney,et al.  Deep Learning for Anomaly Detection: A Survey , 2019, ArXiv.

[24]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[25]  Mohamed Rida,et al.  A novel architecture combined with optimal parameters for back propagation neural networks applied to anomaly network intrusion detection , 2018, Comput. Secur..

[26]  Qusay H. Mahmoud,et al.  Hierarchical Attention-Based Anomaly Detection Model for Embedded Operating Systems , 2018, 2018 IEEE 24th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA).

[27]  Tao Yang,et al.  Word Embedding for Understanding Natural Language: A Survey , 2018 .

[28]  Timothy Mattson,et al.  A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions , 2017, NeurIPS.

[29]  Marcus A. Maloof,et al.  elicit: A System for Detecting Insiders Who Violate Need-to-Know , 2007, RAID.

[30]  Dan Qu,et al.  nLSALog: An Anomaly Detection Framework for Log Sequence in Security Management , 2019, IEEE Access.

[31]  Andy Brown,et al.  Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection , 2018, Proceedings of the First Workshop on Machine Learning for Computing Systems.

[32]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[33]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[34]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[35]  Qi Shi,et al.  A Deep Learning Approach to Network Intrusion Detection , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.

[36]  Bo Zong,et al.  Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection , 2018, ICLR.

[37]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.

[38]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[39]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[40]  Tao Qin,et al.  An Integrated Method for Anomaly Detection From Massive System Logs , 2018, IEEE Access.

[41]  Ling Huang,et al.  Online System Problem Detection by Mining Patterns of Console Logs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[42]  Yu Wen,et al.  Log2vec: A Heterogeneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise , 2019, CCS.

[43]  Abdelouahid Derhab,et al.  Deep learning approaches for anomaly-based intrusion detection systems: A survey, taxonomy, and open issues , 2020, Knowl. Based Syst..

[44]  Kian Hsiang Low,et al.  GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection , 2019, 2019 IEEE Conference on Communications and Network Security (CNS).

[45]  Shilin He,et al.  Experience Report: System Log Analysis for Anomaly Detection , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[46]  Takehisa Yairi,et al.  Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction , 2014, MLSDA'14.

[47]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.