Lognroll: discovering accurate log templates by iterative filtering

Modern IT systems rely heavily on log analytics for critical operational tasks. Since the volume of logs produced from numerous distributed components is overwhelming, it requires us to employ automated processing. The first step of automated log processing is to convert streams of log lines into the sequence of log format IDs, called log templates. A log template serves as a base string with unfilled parts from which logs are generated during runtime by substitution of contextual information. The problem of log template discovery from the volume of collected logs poses a great challenge due to the semi-structured nature of the logs and the computational overheads. Our investigation reveals that existing techniques show various limitations. We approach the log template discovery problem as search-based learning by applying the ILP (Inductive Logic Programming) framework. The algorithm core consists of narrowing down the logs into smaller sets by analyzing value compositions on selected log column positions. Our evaluation shows that it produces accurate log templates from diverse application logs with small computational costs compared to existing methods. With the quality metric we defined, we obtained about 21%-51% improvements of log template quality.

[1]  Guofei Jiang,et al.  LogMine: Fast Pattern Recognition for Log Analytics , 2016, CIKM.

[2]  E. Medvet,et al.  Inference of Regular Expressions for Text Extraction from Examples , 2016, IEEE Transactions on Knowledge and Data Engineering.

[3]  Qiang Fu,et al.  Mining Invariants from Console Logs for System Problem Detection , 2010, USENIX Annual Technical Conference.

[4]  Xu Zhang,et al.  Robust log-based anomaly detection on unstable log data , 2019, ESEC/SIGSOFT FSE.

[5]  Tao Li,et al.  LogSig: generating system events from raw textual logs , 2011, CIKM '11.

[6]  Lin Yang,et al.  LOGAN: Problem Diagnosis in the Cloud Using Log-Based Reference Models , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).

[7]  Zibin Zheng,et al.  Tools and Benchmarks for Automated Log Parsing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[8]  Mladen A. Vouk,et al.  Abstracting log lines to log event types for mining software system logs , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[9]  Gargi Dasgupta,et al.  Anomaly Detection Using Program Control Flow Graph Mining From Execution Logs , 2016, KDD.

[10]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.

[11]  Zibin Zheng,et al.  Drain: An Online Log Parsing Approach with Fixed Depth Tree , 2017, 2017 IEEE International Conference on Web Services (ICWS).

[12]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[13]  Isil Dillig,et al.  Multi-modal synthesis of regular expressions , 2019, PLDI.

[14]  Gilbert Hamann,et al.  An automated approach for abstracting execution logs to execution events , 2008, J. Softw. Maintenance Res. Pract..

[15]  Keiichi Shima,et al.  Length Matters: Clustering System Log Messages using Length of Words , 2016, ArXiv.

[16]  Paolo Arcaini,et al.  Regular Expression Learning with Evolutionary Testing and Repair , 2019, ICTSS.

[17]  Risto Vaarandi,et al.  LogCluster - A data clustering and pattern mining algorithm for event logs , 2015, 2015 11th International Conference on Network and Service Management (CNSM).

[18]  Thomas Reidemeister,et al.  Mining unstructured log files for recurrent fault diagnosis , 2011, 12th IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops.

[19]  Annibale Panichella,et al.  A Search-Based Approach for Accurate Identification of Log Message Formats , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[20]  Risto Vaarandi,et al.  A data clustering algorithm for mining patterns from event logs , 2003, Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764).

[21]  Shilin He,et al.  Towards Automated Log Parsing for Large-Scale Log Data Analysis , 2018, IEEE Transactions on Dependable and Secure Computing.

[22]  Fei Wu,et al.  Structural Event Detection from Log Messages , 2017, KDD.

[23]  Feifei Li,et al.  Spell: Streaming Parsing of System Event Logs , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[24]  Jian Li,et al.  An Evaluation Study on Log Parsing and Its Use in Log Mining , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[25]  Hakjoo Oh,et al.  Synthesizing regular expressions from examples for introductory automata assignments , 2016, GPCE.

[26]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[27]  Xiaohui Gu,et al.  ELT: Efficient Log-based Troubleshooting System for Cloud Computing Infrastructures , 2011, 2011 IEEE 30th International Symposium on Reliable Distributed Systems.

[28]  Evangelos E. Milios,et al.  Clustering event logs using iterative partitioning , 2009, KDD.

[29]  Masayoshi Mizutani,et al.  Incremental Mining of System Log Format , 2013, 2013 IEEE International Conference on Services Computing.

[30]  Liang Tang,et al.  LogTree: A Framework for Generating System Events from Raw Textual Logs , 2010, 2010 IEEE International Conference on Data Mining.

[31]  Sally A. McKee,et al.  Digging deeper into cluster system logs for failure prediction and root cause diagnosis , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).

[32]  Saharon Rosset,et al.  Analyzing system logs: a new view of what's important , 2007 .

[33]  Eric Medvet,et al.  Automatic Synthesis of Regular Expressions from Examples , 2014, Computer.