Towards Automated Log Parsing for Large-Scale Log Data Analysis

Logs are widely used in system management for dependability assurance because they are often the only data available that record detailed system runtime behaviors in production. Because the size of logs is constantly increasing, developers (and operators) intend to automate their analysis by applying data mining methods, therefore structured input data (e.g., matrices) are required. This triggers a number of studies on log parsing that aims to transform free-text log messages into structured events. However, due to the lack of open-source implementations of these log parsers and benchmarks for performance comparison, developers are unlikely to be aware of the effectiveness of existing log parsers and their limitations when applying them into practice. They must often reimplement or redesign one, which is time-consuming and redundant. In this paper, we first present a characterization study of the current state of the art log parsers and evaluate their efficacy on five real-world datasets with over ten million log messages. We determine that, although the overall accuracy of these parsers is high, they are not robust across all datasets. When logs grow to a large scale (e.g., 200 million log messages), which is common in practice, these parsers are not efficient enough to handle such data on a single computer. To address the above limitations, we design and implement a parallel log parser (namely POP) on top of Spark, a large-scale data processing platform. Comprehensive experiments have been conducted to evaluate POP on both synthetic and real-world datasets. The evaluation results demonstrate the capability of POP in terms of accuracy, efficiency, and effectiveness on subsequent log mining tasks.

[1]  Shilin He,et al.  Experience Report: System Log Analysis for Anomaly Detection , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[2]  Huaimin Wang,et al.  Toward Fine-Grained, Unsupervised, Scalable Performance Diagnosis for Production Cloud Computing Systems , 2013, IEEE Transactions on Parallel and Distributed Systems.

[3]  Zhou Li,et al.  Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data , 2014, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[4]  Tao Li,et al.  LogSig: generating system events from raw textual logs , 2011, CIKM '11.

[5]  Evangelos E. Milios,et al.  Clustering event logs using iterative partitioning , 2009, KDD.

[6]  Evangelos E. Milios,et al.  A Lightweight Algorithm for Message Type Extraction in System Application Logs , 2012, IEEE Transactions on Knowledge and Data Engineering.

[7]  Yuriy Brun,et al.  Leveraging existing instrumentation to automatically infer invariant-constrained models , 2011, ESEC/FSE '11.

[8]  David Maier,et al.  The Complexity of Some Problems on Subsequences and Supersequences , 1978, JACM.

[9]  Jian Li,et al.  An Evaluation Study on Log Parsing and Its Use in Log Mining , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[10]  Xiaohui Gu,et al.  ELT: Efficient Log-based Troubleshooting System for Cloud Computing Infrastructures , 2011, 2011 IEEE 30th International Symposium on Reliable Distributed Systems.

[11]  Wei Xu,et al.  System Problem Detection by Mining Console Logs , 2010 .

[12]  Guofei Jiang,et al.  LogMine: Fast Pattern Recognition for Log Analytics , 2016, CIKM.

[13]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[14]  Domenico Cotroneo,et al.  Industry Practices and Event Logging: Assessment of a Critical Software Development Process , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[15]  Ragib Hasan,et al.  Towards Building Forensics Enabled Cloud Through Secure Logging-as-a-Service , 2016, IEEE Transactions on Dependable and Secure Computing.

[16]  Fan Yang,et al.  A comparison of general-purpose distributed systems for data processing , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[17]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[18]  Miroslaw Malek,et al.  Comprehensive logfiles for autonomic systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[19]  Jon Stearley,et al.  What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[20]  Jennifer Rexford,et al.  Sensitivity of PCA for traffic anomaly detection , 2007, SIGMETRICS '07.

[21]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[22]  Ding Yuan,et al.  Characterizing logging practices in open-source software , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[23]  Qiang Fu,et al.  Learning to Log: Helping Developers Make Informed Logging Decisions , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[24]  Qiang Fu,et al.  Where do developers log? an empirical study on logging practices in industry , 2014, ICSE Companion.

[25]  David Lang Using SEC , 2013, login Usenix Mag..

[26]  Ravishankar K. Iyer,et al.  Automated Derivation of Application-Specific Error Detectors Using Dynamic Analysis , 2011, IEEE Transactions on Dependable and Secure Computing.

[27]  Sanjay Goel,et al.  Collaborative Search Log Sanitization: Toward Differential Privacy and Boosted Utility , 2015, IEEE Transactions on Dependable and Secure Computing.

[28]  Domenico Cotroneo,et al.  Assessing time coalescence techniques for the analysis of supercomputer logs , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[29]  Patrick Martin,et al.  Assisting developers of Big Data Analytics Applications when deploying on Hadoop clouds , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[30]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[31]  Risto Vaarandi,et al.  A data clustering algorithm for mining patterns from event logs , 2003, Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764).

[32]  Yang Liu,et al.  Be conservative: enhancing failure diagnosis with proactive logging , 2012, OSDI 2012.

[33]  Jennifer Neville,et al.  Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems , 2012, NSDI.

[34]  Mark Crovella,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM '04.

[35]  Luo Si,et al.  LEAPS: Detecting Camouflaged Attacks with Statistical Learning Guided by Program Analysis , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[36]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[37]  Bojan Cukic,et al.  Log-Based Reliability Analysis of Software as a Service (SaaS) , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.