PLELog: Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation

PLELog is a novel approach for log-based anomaly detection via probabilistic label estimation. It is designed to effectively detect anomalies in unlabeled logs and meanwhile avoid the manual labeling effort for training data generation. We use semantic information within log events as fixed-length vectors and apply HDBSCAN to automatically clustering log sequences. After that, we also propose a Probabilistic Label Estimation approach to reduce the noises introduced by error labeling and put "labeled" instances into an attention-based GRU network for training. We conducted an empirical study to evaluate the effectiveness of PLELog on two open-source log data (i.e., HDFS and BGL). The results demonstrate the effectiveness of PLELog. In particular, PLELog has been applied to two real-world systems from a university and a large corporation, further demonstrating its practicability.

[1]  Leland McInnes,et al.  Accelerated Hierarchical Density Based Clustering , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[2]  Xuyuan Dong,et al.  Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[3]  Xu Zhang,et al.  Robust log-based anomaly detection on unstable log data , 2019, ESEC/SIGSOFT FSE.

[4]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.