Learning-Based Anomaly Detection Using Log Files with Sequential Relationships

Modern IT systems have been transitioning from traditional on-premises solutions to a dynamic mixture of on-premises and off-premises solutions. This transition has also included a trend to run more systems on software-defined resources. The ease of setting up new software-defined servers and systems has led to an increase in IT system complexity as well as the amount of log data generated. Automatic log analysis has become a subject of interest because of the problems with manual log analysis in case of intrusion detection and root-cause analysis. Therefore, this paper proposes and tests a sequence based anomaly detection method. The work has been done in collaboration with the Swedish Social Insurance Agency’s IT department. Real system log data with high privacy requirements and limited available information was generated for training and testing. The generated log data was produced with expected time regions of anomalous behavior. Our proposed anomaly detection model was then able to perform at a state-of-the-art level and could accurately detect certain error types. Showing the potential of the approach when applied directly to a real-world system.

[1]  Stefan Forsström,et al.  Machine Learning Based Anomaly Detection of Log Files Using Ensemble Learning and Self-Attention , 2021, 2021 5th International Conference on System Reliability and Safety (ICSRS).

[2]  Hailong Yang,et al.  HitAnomaly: Hierarchical Transformers for Anomaly Detection in System Log , 2020, IEEE Transactions on Network and Service Management.

[3]  T. Aaron Gulliver,et al.  Unsupervised log message anomaly detection , 2020, ICT Express.

[4]  Odej Kao,et al.  Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs , 2020, 2020 IEEE International Conference on Data Mining (ICDM).

[5]  Jianhui Jiang,et al.  Ensemble Methods for Anomaly Detection Based on System Log , 2019, 2019 IEEE 24th Pacific Rim International Symposium on Dependable Computing (PRDC).

[6]  Annibale Panichella,et al.  A Search-Based Approach for Accurate Identification of Log Message Formats , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[7]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.

[8]  Zibin Zheng,et al.  Drain: An Online Log Parsing Approach with Fixed Depth Tree , 2017, 2017 IEEE International Conference on Web Services (ICWS).

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Ke Zhang,et al.  2016 Ieee International Conference on Big Data (big Data) Automated It System Failure Prediction: a Deep Learning Approach , 2022 .

[11]  Feifei Li,et al.  Spell: Streaming Parsing of System Event Logs , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[12]  Risto Vaarandi,et al.  LogCluster - A data clustering and pattern mining algorithm for event logs , 2015, 2015 11th International Conference on Network and Service Management (CNSM).

[13]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Thomas Reidemeister,et al.  Mining unstructured log files for recurrent fault diagnosis , 2011, 12th IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops.

[15]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[16]  Evangelos E. Milios,et al.  Clustering event logs using iterative partitioning , 2009, KDD.

[17]  Risto Vaarandi,et al.  A data clustering algorithm for mining patterns from event logs , 2003, Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764).

[18]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.