LogParse: Making Log Parsing Adaptive through Word Classification

Logs are one of the most valuable data sources for large-scale service (e.g., social network, search engine) maintenance. Log parsing serves as the the first step towards automated log analysis. However, the current log parsing methods are not adaptive. Without intra-service adaptiveness, log parsing cannot handle software/firmware upgrade because learned templates cannot match new type of logs. In addition, without cross-service adaptiveness, the logs of a new type of service cannot be accurately parsed when this service is newly deployed. We propose LogParse, an adaptive log parsing framework, to support intra-service and cross-service incremental template learning and update. LogParse turns the template generation problem into a word classification problem and learns the features of template words and variable words. We evaluate LogParse on four public production log datasets. The results demonstrate that LogParse supports accurate adaptive template update (increased from 0.559 to nearly 1.0 parsing accuracy), and a trained LogParse is adaptive for a brand new service’s log parsing. Because of LogParse’s adaptiveness, we also apply LogParse to an interesting application, log compression and deployed log compression in a top cloud service provider. We package LogParse into an open-source toolkit.

[1]  Zibin Zheng,et al.  Drain: An Online Log Parsing Approach with Fixed Depth Tree , 2017, 2017 IEEE International Conference on Web Services (ICWS).

[2]  Yu Zhang,et al.  Log Clustering Based Problem Identification for Online Service Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[3]  Lejian Liao,et al.  Can Syntax Help? Improving an LSTM-based Sentence Compression Model for New Domains , 2017, ACL.

[4]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[5]  Tao Li,et al.  LogSig: generating system events from raw textual logs , 2011, CIKM '11.

[6]  Evangelos E. Milios,et al.  Clustering event logs using iterative partitioning , 2009, KDD.

[7]  Jason Phang,et al.  Unsupervised Sentence Compression using Denoising Auto-Encoders , 2018, CoNLL.

[8]  Akio Watanabe,et al.  Spatio-temporal factorization of log data for understanding network events , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[9]  Shenglin Zhang,et al.  Device-Agnostic Log Anomaly Classification with Partial Labels , 2018, 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS).

[10]  Shenglin Zhang,et al.  LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs , 2019, IJCAI.

[11]  Shilin He,et al.  Experience Report: System Log Analysis for Anomaly Detection , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[12]  Niloy Ganguly,et al.  ADELE: Anomaly Detection from Event Log Empiricism , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[13]  Gunasekaran Manogaran,et al.  Human-Computer Interaction With Big Data Analytics , 2018 .

[14]  Zheng Liu,et al.  FLAP: An End-to-End Event Log Analysis Platform for System Management , 2017, KDD.

[15]  Dongmei Zhang,et al.  Identifying impactful service system problems via log analysis , 2018, ESEC/SIGSOFT FSE.

[16]  Weiran Xu,et al.  Combining Word-Level and Character-Level Representations for Relation Classification of Informal Text , 2017, Rep4NLP@ACL.

[17]  Wei Zhang,et al.  Model-based Clustering of Short Text Streams , 2018, KDD.

[18]  R. Srikant,et al.  Learning Latent Events from Network Message Logs: A Decomposition Based Approach , 2018, ArXiv.

[19]  Shenglin Zhang,et al.  Efficient and Robust Syslog Parsing for Network Devices in Datacenter Networks , 2020, IEEE Access.

[20]  Keith Sklower,et al.  A Tree-Based Packet Routing Table for Berkeley Unix , 1991, USENIX Winter.

[21]  Avishay Traeger,et al.  To Zip or not to Zip: effective resource usage for real-time compression , 2013, FAST.

[22]  Ling Huang,et al.  Large-Scale System Problems Detection by Mining Console Logs , 2009 .

[23]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[24]  Feifei Li,et al.  Spell: Streaming Parsing of System Event Logs , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[25]  Zibin Zheng,et al.  Tools and Benchmarks for Automated Log Parsing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[26]  Li Yun,et al.  Short Text Topic Modeling Techniques, Applications, and Performance: A Survey , 2019, IEEE Transactions on Knowledge and Data Engineering.

[27]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.

[28]  Shenglin Zhang,et al.  FUNNEL: Assessing Software Changes in Web-Based Services , 2018, IEEE Transactions on Services Computing.

[29]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[30]  Shenglin Zhang,et al.  Syslog processing for switch failure diagnosis and prediction in datacenter networks , 2017, 2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS).

[31]  Shenglin Zhang,et al.  PreFix: Switch Failure Prediction in Datacenter Networks , 2018, Proc. ACM Meas. Anal. Comput. Syst..