Studying the Use of Java Logging Utilities in the Wild

Software logging is widely used in practice. Logs have been used for a variety of purposes like debugging, monitoring, security compliance, and business analytics. Instead of directly invoking the standard output functions, developers usually prefer to use logging utilities (LUs) (e.g., SLF4J), which provide additional functionalities like thread-safety and verbosity level support, to instrument their source code. Many of the previous research works on software logging are focused on the log printing code. There are very few works studying the use of LUs, although new LUs are constantly being introduced by companies and researchers. In this paper, we conducted a large-scale empirical study on the use of Java LUs in the wild. We analyzed the use of 3,856 LUs from 11, 194 projects in GitHub and found that many projects have complex usage patterns for LUs. For example, 75.8% of the large-sized projects have implemented their own LUs in their projects. More than 50% of these projects use at least three LUs. We conducted further qualitative studies to better understand and characterize the complex use of LUs. Our findings show that different LUs are used for a variety of reasons (e.g., internationalization of the log messages). Some projects develop their own LUs to satisfy project-specific logging needs (e.g., defining the logging format). Multiple uses of LUs in one project are pretty common for large and very large-sized projects mainly for context like enabling and configuring the logging behavior for the imported packages. Interviewing with 13 industrial developers showed that our findings are also generally true for industrial projects and are considered as very helpful for them to better configure and manage the logging behavior for their projects. The findings and the implications presented in this paper will be useful for developers and researchers who are interested in developing and maintaining LUs.

[1]  John K. Ousterhout,et al.  NanoLog: A Nanosecond Scale Logging System , 2018, USENIX Annual Technical Conference.

[2]  Yang Liu,et al.  Be conservative: enhancing failure diagnosis with proactive logging , 2012, OSDI 2012.

[3]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[4]  Cor-Paul Bezemer,et al.  Logging Library Migrations: A Case Study for the Apache Software Foundation Projects , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[5]  Yves Le Traon,et al.  Usage and testability of AOP: An empirical study of AspectJ , 2013, Inf. Softw. Technol..

[6]  Jimmy J. Lin,et al.  Scaling big data mining infrastructure: the twitter experience , 2013, SKDD.

[7]  Michael W. Godfrey,et al.  An Exploratory Study of the Evolution of Communicated Information about the Execution of Large Software Systems , 2011, 2011 18th Working Conference on Reverse Engineering.

[8]  Nikolaos Tsantalis,et al.  Studying and detecting log-related issues , 2018, Empirical Software Engineering.

[9]  Robert W. Bowdidge,et al.  Why don't software developers use static analysis tools to find bugs? , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[10]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[11]  Wei Xu,et al.  Advances and challenges in log analysis , 2011, Commun. ACM.

[12]  Sven Apel,et al.  How AspectJ is Used: An Analysis of Eleven AspectJ Programs , 2010, J. Object Technol..

[13]  Ding Yuan,et al.  Characterizing logging practices in open-source software , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[14]  Qiang Fu,et al.  Learning to Log: Helping Developers Make Informed Logging Decisions , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[15]  Jinfu Chen,et al.  Studying the characteristics of logging practices in mobile apps: a case study on F-Droid , 2019, Empirical Software Engineering.

[16]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[17]  Zhen Ming Jiang,et al.  Characterizing logging practices in Java-based open source software projects – a replication study in Apache Software Foundation , 2016, Empirical Software Engineering.

[18]  Mauricio A. Saca Refactoring improving the design of existing code , 2017, 2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII).

[19]  Shilin He,et al.  Characterizing the Natural Language Descriptions in Software Logging Statements , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[20]  Premkumar T. Devanbu,et al.  A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.

[21]  Domenico Cotroneo,et al.  Industry Practices and Event Logging: Assessment of a Critical Software Development Process , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[22]  Qiang Fu,et al.  Log2: A Cost-Aware Logging Mechanism for Performance Diagnosis , 2015, USENIX Annual Technical Conference.

[23]  Zhenhao Li,et al.  Characterizing and Detecting Duplicate Logging Code Smells , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[24]  Ahmed E. Hassan,et al.  Studying the relationship between logging characteristics and the code quality of platform software , 2015, Empirical Software Engineering.

[25]  Mark Marron,et al.  Log++ logging for a cloud-native world , 2018, DLS.

[26]  Richard C. Holt,et al.  Using development history sticky notes to understand software architecture , 2004, Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004..

[27]  Steven M. Drucker,et al.  The Bones of the System: A Case Study of Logging and Telemetry at Microsoft , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[28]  Qiang Fu,et al.  Where do developers log? an empirical study on logging practices in industry , 2014, ICSE Companion.

[29]  Fabio Petrillo,et al.  Software Configuration Engineering in Practice Interviews, Survey, and Systematic Literature Review , 2020, IEEE Transactions on Software Engineering.

[30]  John R. Gilbert,et al.  Aspect-Oriented Programming of Sparse Matrix Code , 1997, ISCOPE.

[31]  Zibin Zheng,et al.  Tools and Benchmarks for Automated Log Parsing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[32]  Ding Yuan,et al.  Improving Software Diagnosability via Log Enhancement , 2012, TOCS.

[33]  Ying Zou,et al.  Towards just-in-time suggestions for log changes , 2016, Empirical Software Engineering.

[34]  Cristina V. Lopes,et al.  How scale affects structure in Java programs , 2015, OOPSLA.

[35]  Tom Mens,et al.  Towards a survival analysis of database framework usage in Java projects , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[36]  Tao Xie,et al.  An Exploratory Study of Logging Configuration Practice in Java , 2019, 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[37]  Zhen Ming Jiang,et al.  Characterizing and Detecting Anti-Patterns in the Logging Code , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[38]  Heng Li,et al.  Which log level should developers choose for a new logging statement? , 2017, Empirical Software Engineering.

[39]  Jez Humble,et al.  Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation , 2010 .

[40]  Jinqiu Yang,et al.  DLFinder: Characterizing and Detecting Duplicate Logging Code Smells , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[41]  Daniela E. Damian,et al.  Selecting Empirical Methods for Software Engineering Research , 2008, Guide to Advanced Empirical Software Engineering.

[42]  Tom Mens,et al.  Analyzing the evolution of testing library usage in open source Java projects , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[43]  David Lo,et al.  A Large Scale Study of Multiple Programming Languages and Code Quality , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[44]  Boyuan Chen,et al.  Extracting and studying the Logging-Code-Issue- Introducing changes in Java-based large-scale open source software systems , 2019, Empirical Software Engineering.

[45]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[46]  Yu Luo,et al.  Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold , 2017, SOSP.

[47]  Ding Yuan,et al.  SherLog: error diagnosis by connecting clues from run-time logs , 2010, ASPLOS XV.

[48]  David Lo,et al.  A Large Scale Study of Long-Time Contributor Prediction for GitHub Projects , 2021, IEEE Transactions on Software Engineering.