LogNADS: Network anomaly detection scheme based on log semantics representation

Abstract Semantics-aware anomaly detection based on log has attracted much attention. However, the existing methods based on the weighted aggregation of all word vectors might lose the semantic relationship of word order and cannot maintain the unique representation, and the methods based on word order-preserving by concatenating all word vectors might lead to a high computation time cost. To solve these issues and further improve the sequential anomaly detection, this paper proposes a network anomaly detection scheme LogNADS by designing a novel log semantics representation method and an adaptive sequence data construction method. It first discards the useless words and then selects theme words to hold the log abstraction and maintain a low time cost as well. Subsequently, it concatenates theme words’ vectors based on the original word order to maintain the unique representation and avoid the word order loss. Furthermore, to better detect the sequential anomalies, we utilize the sliding window scheme and design a method to compute the optimal window size for constructing the log sequence self-adaptively, and then LSTM is built to extract timing characteristics of the log sequences. Experimental results conducted on the public benchmark HDFS dataset and BGL dataset demonstrate the effectiveness of LogNADS through comparing with other state-of-the-art methods in the detection accuracy and time cost. Moreover, the statistical significance tests prove the superior performance.

[1]  Shenglin Zhang,et al.  A Semantic-aware Representation Framework for Online Log Analysis , 2020, 2020 29th International Conference on Computer Communications and Networks (ICCCN).

[2]  Feifei Li,et al.  Spell: Streaming Parsing of System Event Logs , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[3]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2012, Springer Berlin Heidelberg.

[4]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.

[5]  Gugulothu Narsimha,et al.  CLAPP: A self constructing feature clustering approach for anomaly detection , 2017, Future Gener. Comput. Syst..

[6]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[7]  Simon Parkinson,et al.  Eliciting and utilising knowledge for security event log analysis: An association rule mining and automated planning approach , 2018, Expert Syst. Appl..

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Ling Huang,et al.  Online System Problem Detection by Mining Patterns of Console Logs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[10]  Zhang Xiong,et al.  Improving Word Representation with Word Pair Distributional Asymmetry , 2018, 2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC).

[11]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[12]  Zhang Yuzhi,et al.  Unified Anomaly Detection for Syntactically Diverse Logs in Cloud Datacenter , 2020 .

[13]  K. P. Soman,et al.  Long short-term memory based operation log anomaly detection , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[14]  Richard Futrell,et al.  Universals of word order reflect optimization of grammars for efficient communication , 2020, Proceedings of the National Academy of Sciences.

[15]  Shilin He,et al.  Experience Report: System Log Analysis for Anomaly Detection , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[16]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[17]  Ying Zhong,et al.  HELAD: A novel network anomaly detection model based on heterogeneous ensemble learning , 2020, Comput. Networks.

[18]  Jakub Breier,et al.  Anomaly Detection from Log Files Using Data Mining Techniques , 2015 .

[19]  Jianfeng Ma,et al.  Dlog: diagnosing router events with syslogs for anomaly detection , 2017, The Journal of Supercomputing.

[20]  Christian Callegari,et al.  Entropy-based network anomaly Detection , 2017, 2017 International Conference on Computing, Networking and Communications (ICNC).

[21]  Dan Qu,et al.  nLSALog: An Anomaly Detection Framework for Log Sequence in Security Management , 2019, IEEE Access.

[22]  Diego Reforgiato Recupero,et al.  A Local Feature Engineering Strategy to Improve Network Anomaly Detection , 2020, Future Internet.

[23]  Jing Yang,et al.  A Novel Semantic-Aware Approach for Detecting Malicious Web Traffic , 2017, ICICS.

[24]  Michael R. Lyu,et al.  Online Nonlinear AUC Maximization for Imbalanced Data Sets , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Shengrui Wang,et al.  Parameter-Free Anomaly Detection for Categorical Data , 2011, MLDM.

[26]  Dan Qu,et al.  An online log template extraction method based on hierarchical clustering , 2019, EURASIP J. Wirel. Commun. Netw..

[27]  Thomas J. Hacker,et al.  A Markov Random Field Based Approach for Analyzing Supercomputer System Logs , 2019, IEEE Transactions on Cloud Computing.

[28]  Jin Wang,et al.  LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things , 2020, Sensors.

[29]  Florian Skopik,et al.  System log clustering approaches for cyber security applications: A survey , 2020, Comput. Secur..

[30]  Diego Reforgiato Recupero,et al.  A Probabilistic-driven Ensemble Approach to Perform Event Classification in Intrusion Detection System. , 2018 .

[31]  Christopher Leckie,et al.  High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning , 2016, Pattern Recognit..

[32]  Sophie Chabridon,et al.  Improving Performances of Log Mining for Anomaly Prediction Through NLP-Based Log Parsing , 2018, 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).

[33]  Andy Brown,et al.  Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection , 2018, Proceedings of the First Workshop on Machine Learning for Computing Systems.

[34]  Azzedine Boukerche,et al.  A multi-stage anomaly detection scheme for augmenting the security in IoT-enabled applications , 2020, Future Gener. Comput. Syst..

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36]  Shenglin Zhang,et al.  LogTransfer: Cross-System Log Anomaly Detection for Software Systems with Transfer Learning , 2020, 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE).

[37]  Yue Yuan,et al.  Learning-Based Anomaly Cause Tracing with Synthetic Analysis of Logs from Multiple Cloud Service Components , 2019, 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC).

[38]  Zhiwei Xu,et al.  Towards Accurate Deceptive Opinion Spam Detection based on Word Order-preserving CNN , 2017, Mathematical Problems in Engineering.

[39]  Jon Stearley,et al.  What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[40]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[41]  Tao Li,et al.  Confidence guided anomaly detection model for anti-concept drift in dynamic logs , 2020, J. Netw. Comput. Appl..

[42]  Bo Zong,et al.  LogLens: A Real-Time Log Analysis System , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[43]  Jie Zhang,et al.  A deep learning-based RNNs model for automatic security audit of short messages , 2016, 2016 16th International Symposium on Communications and Information Technologies (ISCIT).

[44]  R. Akkiraju,et al.  Using Language Models to Pre-train Features for Optimizing Information Technology Operations Management Tasks , 2020, ICSOC Workshops.

[45]  Melody Moh,et al.  CausalConvLSTM: Semi-Supervised Log Anomaly Detection Through Sequence Modeling , 2019, 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA).

[46]  Ke Zhang,et al.  Execution anomaly detection in large-scale systems through console log analysis , 2018, J. Syst. Softw..

[47]  Om Prakash Vyas,et al.  A Feature Subset Selection Technique for High Dimensional Data Using Symmetric Uncertainty , 2014 .

[48]  Shenglin Zhang,et al.  LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs , 2019, IJCAI.

[49]  Shilin He,et al.  Towards Automated Log Parsing for Large-Scale Log Data Analysis , 2018, IEEE Transactions on Dependable and Secure Computing.

[50]  Xiaoqiang Di,et al.  FastLogSim: A Quick Log Pattern Parser Scheme Based on Text Similarity , 2020, KSEM.

[51]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[52]  Wei Liu,et al.  Testing Statistical Hypotheses of Equivalence and Noninferiority, 2nd edn by Stefan Wellek , 2011 .

[53]  Peng Gao,et al.  SAQL: A Stream-based Query System for Real-Time Abnormal System Behavior Detection , 2018, USENIX Security Symposium.

[54]  Alain Tapp,et al.  Towards Lossless Encoding of Sentences , 2019, ACL.

[55]  Ali S. Hadi,et al.  Anomaly Detection Methods for Categorical Data , 2019, ACM Comput. Surv..

[56]  Matthijs Douze,et al.  FastText.zip: Compressing text classification models , 2016, ArXiv.

[57]  Ferdous Sohel,et al.  Automatic Event Log Abstraction to Support Forensic Investigation , 2020, ACSW.

[58]  Qiang Fu,et al.  Mining Invariants from Console Logs for System Problem Detection , 2010, USENIX Annual Technical Conference.

[59]  Xu Zhang,et al.  Robust log-based anomaly detection on unstable log data , 2019, ESEC/SIGSOFT FSE.

[60]  Peng Gao,et al.  AIQL: Enabling Efficient Attack Investigation from System Monitoring Data , 2018, USENIX Annual Technical Conference.

[61]  Thomas Demeester,et al.  Representation learning for very short texts using weighted word embedding aggregation , 2016, Pattern Recognit. Lett..