Machine Deserves Better Logging: A Log Enhancement Approach for Automatic Fault Diagnosis

When systems fail, log data is often the most important information source for fault diagnosis. However, the performance of automatic fault diagnosis is limited by the ad-hoc nature of logs. The key problem is that existing developer-written logs are designed for humans rather than machines to automatically detect system anomalies. To improve the quality of logs for fault diagnosis, we propose a novel log enhancement approach which automatically identifies logging points that reflect anomalous behavior during system fault. We evaluate our approach on three popular software systems AcmeAir, HDFS and TensorFlow. Results show that it can significantly improve fault diagnosis accuracy by 50% on average compared to the developers' manually placed logging points.

[1]  Ding Yuan,et al.  Characterizing logging practices in open-source software , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[2]  Qiang Fu,et al.  Learning to Log: Helping Developers Make Informed Logging Decisions , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[3]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[4]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[5]  Ying Zou,et al.  Towards just-in-time suggestions for log changes , 2016, Empirical Software Engineering.

[6]  Ying Li,et al.  An Approach for Anomaly Diagnosis Based on Hybrid Graph Model with Logs for Distributed Services , 2017, 2017 IEEE International Conference on Web Services (ICWS).

[7]  Yang Liu,et al.  Be conservative: enhancing failure diagnosis with proactive logging , 2012, OSDI 2012.

[8]  Risto Vaarandi,et al.  LogCluster - A data clustering and pattern mining algorithm for event logs , 2015, 2015 11th International Conference on Network and Service Management (CNSM).

[9]  Qiang Fu,et al.  Mining program workflow from interleaved traces , 2010, KDD.

[10]  Ying Li,et al.  LogSed: Anomaly Diagnosis through Mining Time-Weighted Control Flow Graph in Logs , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[11]  Yu Luo,et al.  Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold , 2017, SOSP.

[12]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[13]  Ding Yuan,et al.  Improving Software Diagnosability via Log Enhancement , 2012, TOCS.

[14]  Qiang Fu,et al.  Where do developers log? an empirical study on logging practices in industry , 2014, ICSE Companion.

[15]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.