Two Level Empirical Study of Logging Statements in Open Source Java Projects

Log statements present in source code provide important information to the software developers because they are useful in various software development activities. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this paper, the authors perform an in-depth and large-scale analysis of logging code constructs at two levels. They answer nine research questions related to statistical and content analysis. Statistical analysis at file level reveals that fewer files consist of log statements but logged files have a greater complexity than that of non-logged files. Results show that a positive correlation exists between size and logging count of the logged files. Statistical analysis on catch-blocks show that try-blocks associated with logged catch-blocks have greater complexity than non-logged catch-blocks and the logging ratio of an exception type is project specific. Content-based analysis of catch-blocks reveals the presence of different topics in try-blocks associated with logged and non-logged catch-blocks.

[1]  Qiang Fu,et al.  Where do developers log? an empirical study on logging practices in industry , 2014, ICSE Companion.

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  Jennifer Neville,et al.  Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems , 2012, NSDI.

[4]  Ahmed E. Hassan,et al.  Studying software evolution using topic models , 2014, Sci. Comput. Program..

[5]  Tong-Seng Quah,et al.  Application of neural networks for software quality prediction using object-oriented metrics , 2005, J. Syst. Softw..

[6]  Ahmed E. Hassan,et al.  What are developers talking about? An analysis of topics and trends in Stack Overflow , 2014, Empirical Software Engineering.

[7]  Ahmed E. Hassan,et al.  Studying the relationship between logging characteristics and the code quality of platform software , 2015, Empirical Software Engineering.

[8]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[9]  Santonu Sarkar,et al.  Mining business topics in source code using latent dirichlet allocation , 2008, ISEC '08.

[10]  Leonardo Mariani,et al.  Automated Identification of Failure Causes in System Logs , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[11]  Denys Poshyvanyk,et al.  Using Latent Dirichlet Allocation for automatic categorization of software , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[12]  Ashish Sureka,et al.  LogOpt: Static Feature Extraction from Source Code for Automated Catch Block Logging Prediction , 2016, ISEC.

[13]  Walid Maalej,et al.  How do open source communities blog? , 2012, Empirical Software Engineering.

[14]  Ding Yuan,et al.  SherLog: error diagnosis by connecting clues from run-time logs , 2010, ASPLOS 2010.

[15]  Ding Yuan,et al.  Characterizing logging practices in open-source software , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[16]  Qiang Fu,et al.  Learning to Log: Helping Developers Make Informed Logging Decisions , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[17]  Chita R. Das,et al.  Modeling and synthesizing task placement constraints in Google compute clusters , 2011, SoCC.

[18]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[19]  Randy H. Katz,et al.  Chukwa: A System for Reliable Large-Scale Log Collection , 2010, LISA.

[20]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.