An Anomaly Detection Approach of Part-of-Speech Log Sequence via Population Based Training

Log data is a valuable resource for understanding system status. Log recording running status for a computer system is commonly used to identify performance issues and malfunctions. Sequential anomaly detection of logs is crucial for building a secure and stable system and is beneficial for the discovery, location, and analysis of system failures. In this paper, we propose a new log sequential anomaly detection method based on natural language processing techniques by the Population Based Training (PBT) algorithm, which can make full use of semantic information in log templates to analyze log sequences. The Part-of-Speech (PoS) weight mechanism is first employed to improve the digital representation quality of the log template in the feature extraction. And then, TextCNN is used to extract noteworthy information in log template vectors. In the sequence log anomaly detection stage, the combination of TextCNN and LSTM neural network can improve the accuracy of log sequential anomaly detection. On the other hand, the proposed method jointly trains the parameters of the PoS weight mechanism and the parameters of the anomaly detection neural network model through the PBT algorithm, which accelerates the model convergence speed and improves the accuracy of the log sequential anomaly detection. Our model has been tested on four data sets and compared with two state-of-the-art models to prove the effectiveness of our model. The experimental results show that, compared with other log anomaly detection methods, the proposed method performs well.

[1]  Feng Lin,et al.  LogPS: A Robust Log Sequential Anomaly Detection Approach Based on Natural Language Processing , 2022, 2022 IEEE 22nd International Conference on Communication Technology (ICCT).

[2]  Dan Pei,et al.  LogClass: Anomalous Log Identification and Classification With Partial Labels , 2021, IEEE Transactions on Network and Service Management.

[3]  Shenglin Zhang,et al.  LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs , 2019, IJCAI.

[4]  Zibin Zheng,et al.  Tools and Benchmarks for Automated Log Parsing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[5]  Shenglin Zhang,et al.  Rapid Deployment of Anomaly Detection Models for Large Number of Emerging KPI Streams , 2018, 2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC).

[6]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[7]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.

[8]  Zibin Zheng,et al.  Drain: An Online Log Parsing Approach with Fixed Depth Tree , 2017, 2017 IEEE International Conference on Web Services (ICWS).

[9]  Kenneth Ward Church,et al.  Word2Vec , 2016, Natural Language Engineering.

[10]  Shilin He,et al.  Experience Report: System Log Analysis for Anomaly Detection , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[11]  Qiang Fu,et al.  Mining Invariants from Console Logs for System Problem Detection , 2010, USENIX Annual Technical Conference.

[12]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[13]  Jon Stearley,et al.  What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[14]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[15]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Ling Huang,et al.  Large-Scale System Problems Detection by Mining Console Logs , 2009 .

[17]  Steven Bird NLTK: The Natural Language Toolkit , 2006, ACL.