Prefix-Graph: A Versatile Log Parsing Approach Merging Prefix Tree with Probabilistic Graph

Logs play an important part in analyzing system behavior and diagnosing system failures. As the basic step of log analysis, log parsing converts raw log messages into structured log templates. However, existing log parsing approaches are not adaptive and versatile enough to ensure their high accuracy on all types of datasets. In particular, it is required to design regular expressions or fine-tune the hyper-parameters manually for the best performance. In this paper, we propose Prefix-Graph, an online versatile log parsing approach. Prefix-Graph is a probabilistic graph structure extended from prefix tree. It iteratively merges together two branches which have high similarity in probability distribution, and represents log templates as the combination of cut-edges in root-to-leaf paths of the graph. Since no domain knowledge is used and all the parameters are fixed, Prefix-Graph can be easily applied to different log datasets without any additional manual work. We evaluate our approach on 10 real-world datasets and 117GB log messages obtained from Huawei. The experimental results demonstrate that Prefix-Graph achieves the highest average accuracy of 0.975 and the smallest standard deviation of 0.037. Our approach is superior to baseline methods in terms of adaptability and versatility.

[1]  Domenico Cotroneo,et al.  Industry Practices and Event Logging: Assessment of a Critical Software Development Process , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[2]  Guofei Jiang,et al.  LogMine: Fast Pattern Recognition for Log Analytics , 2016, CIKM.

[3]  Zheng Liu,et al.  FLAP: An End-to-End Event Log Analysis Platform for System Management , 2017, KDD.

[4]  Feifei Li,et al.  Spell: Streaming Parsing of System Event Logs , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[5]  Zibin Zheng,et al.  Drain: An Online Log Parsing Approach with Fixed Depth Tree , 2017, 2017 IEEE International Conference on Web Services (ICWS).

[6]  Kwan-Liu Ma,et al.  MELA: A Visual Analytics Tool for Studying Multifidelity HPC System Logs , 2019, 2019 IEEE/ACM Industry/University Joint International Workshop on Data-center Automation, Analytics, and Control (DAAC).

[7]  Annibale Panichella,et al.  A Search-Based Approach for Accurate Identification of Log Message Formats , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[8]  Shenglin Zhang,et al.  Syslog processing for switch failure diagnosis and prediction in datacenter networks , 2017, 2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS).

[9]  Ferdous Sohel,et al.  Automatic Event Log Abstraction to Support Forensic Investigation , 2020, ACSW.

[10]  Shilin He,et al.  Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics , 2020, ArXiv.

[11]  Jian Li,et al.  An Evaluation Study on Log Parsing and Its Use in Log Mining , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[12]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[13]  Mladen A. Vouk,et al.  Abstracting log lines to log event types for mining software system logs , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[14]  Risto Vaarandi,et al.  LogCluster - A data clustering and pattern mining algorithm for event logs , 2015, 2015 11th International Conference on Network and Service Management (CNSM).

[15]  Rajat Gupta,et al.  Logan: A Distributed Online Log Parser , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[16]  Gilbert Hamann,et al.  Abstracting Execution Logs to Execution Events for Enterprise Applications (Short Paper) , 2008, 2008 The Eighth International Conference on Quality Software.

[17]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[18]  Akio Watanabe,et al.  Proactive failure detection learning generation patterns of large-scale network logs , 2015, 2015 11th International Conference on Network and Service Management (CNSM).

[19]  Zibin Zheng,et al.  Tools and Benchmarks for Automated Log Parsing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[20]  Akio Watanabe,et al.  Spatio-temporal factorization of log data for understanding network events , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[21]  Robert E. Tarjan,et al.  A fast algorithm for finding dominators in a flowgraph , 1979, TOPL.

[22]  Evangelos E. Milios,et al.  Clustering event logs using iterative partitioning , 2009, KDD.

[23]  Odej Kao,et al.  Anomaly Detection from System Tracing Data Using Multimodal Deep Learning , 2019, 2019 IEEE 12th International Conference on Cloud Computing (CLOUD).

[24]  Hailong Yang,et al.  Paddy: An Event Log Parsing Approach using Dynamic Dictionary , 2020, NOMS 2020 - 2020 IEEE/IFIP Network Operations and Management Symposium.

[25]  Shenglin Zhang,et al.  LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs , 2019, IJCAI.

[26]  Shenglin Zhang,et al.  LogParse: Making Log Parsing Adaptive through Word Classification , 2020, 2020 29th International Conference on Computer Communications and Networks (ICCCN).

[27]  Sophie Chabridon,et al.  Improving Performances of Log Mining for Anomaly Prediction Through NLP-Based Log Parsing , 2018, 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).