Characterizing logging practices in Java-based open source software projects – a replication study in Apache Software Foundation

Log messages, which are generated by the debug statements that developers insert into the code at runtime, contain rich information about the runtime behavior of software systems. Log messages are used widely for system monitoring, problem diagnoses and legal compliances. Yuan et al. performed the first empirical study on the logging practices in open source software systems. They studied the development history of four C/C++ server-side projects and derived ten interesting findings. In this paper, we have performed a replication study in order to assess whether their findings would be applicable to Java projects in Apache Software Foundations. We examined 21 different Java-based open source projects from three different categories: server-side, client-side and supporting-component. Similar to the original study, our results show that all projects contain logging code, which is actively maintained. However, contrary to the original study, bug reports containing log messages take a longer time to resolve than bug reports without log messages. A significantly higher portion of log updates are for enhancing the quality of logs (e.g., formatting & style changes and spelling/grammar fixes) rather than co-changes with feature implementations (e.g., updating variable names).

[1]  Ahmed E. Hassan,et al.  MapReduce as a general framework to support research in Mining Software Repositories (MSR) , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[2]  Ahmed E. Hassan,et al.  Continuous validation of load test suites , 2014, ICPE.

[3]  Yuriy Brun,et al.  Inferring models of concurrent systems from logs of their behavior with CSight , 2014, ICSE.

[4]  Qiang Fu,et al.  Where do developers log? an empirical study on logging practices in industry , 2014, ICSE Companion.

[5]  Jacek Czerwonka,et al.  Code Ownership and Software Quality: A Replication Study , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[6]  Peter Kampstra,et al.  Beanplot: A Boxplot Alternative for Visual Comparison of Distributions , 2008 .

[7]  Premkumar T. Devanbu,et al.  Sample size vs. bias in defect prediction , 2013, ESEC/FSE 2013.

[8]  D HerbslebJames,et al.  Two case studies of open source software development , 2002 .

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  Ding Yuan,et al.  SherLog: error diagnosis by connecting clues from run-time logs , 2010, ASPLOS 2010.

[11]  Ding Yuan,et al.  Characterizing logging practices in open-source software , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[12]  Qiang Fu,et al.  Learning to Log: Helping Developers Make Informed Logging Decisions , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[13]  Gilbert Hamann,et al.  Automatic identification of load testing problems , 2008, 2008 IEEE International Conference on Software Maintenance.

[14]  Forrest Shull,et al.  Building Knowledge through Families of Experiments , 1999, IEEE Trans. Software Eng..

[15]  Dorina C. Petriu,et al.  The Future of Software Performance Engineering , 2007, Future of Software Engineering (FOSE '07).

[16]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[17]  Ding Yuan,et al.  SherLog: error diagnosis by connecting clues from run-time logs , 2010, ASPLOS XV.

[18]  Harald C. Gall,et al.  Replicating mining studies with SOFAS , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[19]  Ahmed E. Hassan,et al.  Replicating and Re-Evaluating the Theory of Relative Defect-Proneness , 2015, IEEE Transactions on Software Engineering.

[20]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2010, IEEE Trans. Software Eng..

[21]  Yuriy Brun,et al.  Leveraging existing instrumentation to automatically infer invariant-constrained models , 2011, ESEC/FSE '11.

[22]  Harald C. Gall,et al.  Change Analysis with Evolizer and ChangeDistiller , 2009, IEEE Software.

[23]  Harald C. Gall,et al.  Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction , 2007, IEEE Transactions on Software Engineering.

[24]  Gilbert Hamann,et al.  Automated performance analysis of load tests , 2009, 2009 IEEE International Conference on Software Maintenance.

[25]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[26]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[27]  Václav Rajlich,et al.  Software evolution and maintenance , 2014, FOSE.

[28]  Patrick Martin,et al.  Assisting developers of Big Data Analytics Applications when deploying on Hadoop clouds , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[29]  Qiang Fu,et al.  Log2: A Cost-Aware Logging Mechanism for Performance Diagnosis , 2015, USENIX Annual Technical Conference.

[30]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[31]  Yuanyuan Zhou,et al.  /*icomment: bugs or bad comments?*/ , 2007, SOSP.

[32]  Ahmed E. Hassan,et al.  Studying the relationship between logging characteristics and the code quality of platform software , 2015, Empirical Software Engineering.

[33]  Daniel M. German,et al.  Open source software peer review practices , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[34]  Rahul Premraj,et al.  Network Versus Code Metrics to Predict Defects: A Replication Study , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[35]  Ding Yuan,et al.  Improving Software Diagnosability via Log Enhancement , 2012, TOCS.

[36]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[37]  Michael W. Godfrey,et al.  An Exploratory Study of the Evolution of Communicated Information about the Execution of Large Software Systems , 2011, WCRE.

[38]  A. Hassan,et al.  An Industrial Case Study of Customizing Operational Profiles Using Log Compression , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[39]  Gregorio Robles,et al.  Replicating MSR: A study of the potential replicability of papers published in the Mining Software Repositories proceedings , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[40]  Premkumar T. Devanbu,et al.  Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.

[41]  Wei Xu,et al.  Advances and challenges in log analysis , 2011, Commun. ACM.