Machine Learning Based Anomaly Detection of Log Files Using Ensemble Learning and Self-Attention

Modern enterprise IT systems generate large amounts of log data to record system state, potential errors, and performance metrics. Manual analysis of log data is becoming more difficult as these systems become more complex. Therefore, machine learning based anomaly detection of system logs is a vital component for the future of system management. Existing log anomaly detection models commonly rely on learning the general normal behavior of the target systems to accurately detect anomalies. They are however limited by the often sparse existing system knowledge. Therefore, this paper proposes a general anomaly detection method which requires little or no knowledge of the target system. This is done by assuming there are semantic similarities in different systems’ log data. Labeled log data from other systems can then be used for training the anomaly detection model. The model uses self-attention transformers and ensemble learning techniques to learn the semantic representation of normal and abnormal log messages. The proposed method achieves a performance comparable to other log anomaly detection methods while requiring little knowledge of the target system.