Which log level should developers choose for a new logging statement?

Logging statements are used to record valuable runtime information about applications. Each logging statement is assigned a log level such that users can disable some verbose log messages while allowing the printing of other important ones. However, prior research finds that developers often have difficulties when determining the appropriate level for their logging statements. In this paper, we propose an approach to help developers determine the appropriate log level when they add a new logging statement. We analyze the development history of four open source projects (Hadoop, Directory Server, Hama, and Qpid), and leverage ordinal regression models to automatically suggest the most appropriate level for each newly-added logging statement. First, we find that our ordinal regression model can accurately suggest the levels of logging statements with an AUC (area under the curve; the higher the better) of 0.75 to 0.81 and a Brier score (the lower the better) of 0.44 to 0.66, which is better than randomly guessing the appropriate log level (with an AUC of 0.50 and a Brier score of 0.80 to 0.83) or naively guessing the log level based on the proportional distribution of each log level (with an AUC of 0.50 and a Brier score of 0.65 to 0.76). Second, we find that the characteristics of the containing block of a newly-added logging statement, the existing logging statements in the containing source code file, and the content of the newly-added logging statement play important roles in determining the appropriate log level for that logging statement.

[1]  Cor-Paul Bezemer,et al.  Logging Library Migrations: A Case Study for the Apache Software Foundation Projects , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[2]  Domenico Cotroneo,et al.  Industry Practices and Event Logging: Assessment of a Critical Software Development Process , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[3]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[4]  Herman Aguinis Regression Analysis for Categorical Moderators , 2003 .

[5]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[6]  Michael W. Godfrey,et al.  An Exploratory Study of the Evolution of Communicated Information about the Execution of Large Software Systems , 2011, 2011 18th Working Conference on Reverse Engineering.

[7]  Yang Liu,et al.  Be conservative: enhancing failure diagnosis with proactive logging , 2012, OSDI 2012.

[8]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[9]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[10]  Ahmed E. Hassan,et al.  Understanding the impact of code and process metrics on post-release defects: a case study on the Eclipse project , 2010, ESEM '10.

[11]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[12]  Andreas Dominik Cullmann,et al.  Multiple Class Area under ROC Curve , 2015 .

[13]  N. Mantel Why Stepdown Procedures in Variable Selection , 1970 .

[14]  Ding Yuan,et al.  SherLog: error diagnosis by connecting clues from run-time logs , 2010, ASPLOS XV.

[15]  Ding Yuan,et al.  Improving Software Diagnosability via Log Enhancement , 2012, TOCS.

[16]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[17]  Shane McIntosh,et al.  An empirical study of the impact of modern code review practices on software quality , 2015, Empirical Software Engineering.

[18]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[19]  Wei Xu,et al.  Advances and challenges in log analysis , 2011, Commun. ACM.

[20]  Cor-Paul Bezemer,et al.  Examining the stability of logging statements , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[21]  J. Lawless,et al.  Efficient Screening of Nonnormal Regression Models , 1978 .

[22]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[23]  Yu Luo,et al.  Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems , 2014, OSDI.

[24]  Shane McIntosh,et al.  The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK projects , 2014, MSR 2014.

[25]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[26]  Leonardo Mariani,et al.  Automated Identification of Failure Causes in System Logs , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[27]  W. Briggs Statistical Methods in the Atmospheric Sciences , 2007 .

[28]  P Barton,et al.  Systematic review and individual patient data meta-analysis of diagnosis of heart failure, with modelling of implications of different diagnostic strategies in primary care. , 2009, Health technology assessment.

[29]  R. McKelvey,et al.  A statistical model for the analysis of ordinal level dependent variables , 1975 .

[30]  Ding Yuan,et al.  Characterizing logging practices in open-source software , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[31]  Qiang Fu,et al.  Learning to Log: Helping Developers Make Informed Logging Decisions , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[32]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[33]  Qiang Fu,et al.  Where do developers log? an empirical study on logging practices in industry , 2014, ICSE Companion.

[34]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[35]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[36]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[37]  Leonardo Mariani,et al.  A toolset for automated failure analysis , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[38]  Richard M. Huggins,et al.  Variables Selection using the Wald Test and a Robust Cp , 1996 .