Classifying Code Comments in Java Open-Source Software Systems

Code comments are a key software component containing information about the underlying implementation. Several studies have shown that code comments enhance the readability of the code. Nevertheless, not all the comments have the same goal and target audience. In this paper, we investigate how six diverse Java OSS projects use code comments, with the aim of understanding their purpose. Through our analysis, we produce a taxonomy of source code comments, subsequently, we investigate how often each category occur by manually classifying more than 2,000 code comments from the aforementioned projects. In addition, we conduct an initial evaluation on how to automatically classify code comments at line level into our taxonomy using machine learning, initial results are promising and suggest that an accurate classification is within reach.

[1]  Houari A. Sahraoui,et al.  How Good is Your Comment? A Study of Comments in Java Programs , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[2]  Ted Tenny,et al.  Program Readability: Procedures Versus Comments , 1988, IEEE Trans. Software Eng..

[3]  Elmar Jürgens,et al.  Quality analysis of source code comments , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[4]  Yuanyuan Zhou,et al.  /*icomment: bugs or bad comments?*/ , 2007, SOSP.

[5]  Scott N. Woodfield,et al.  The effect of modularization and comments on program comprehension , 1981, ICSE '81.

[6]  Mario F. Triola,et al.  Elementary Statistics Using Excel (3rd Edition) , 2006 .

[7]  Ioannis Stamelos,et al.  Code quality analysis in open source software development , 2002, Inf. Syst. J..

[8]  Andrian Marcus,et al.  Recovery of Traceability Links between Software Documentation and Source Code , 2005, Int. J. Softw. Eng. Knowl. Eng..

[9]  David W. Binkley,et al.  Leveraged Quality Assessment using Information Retrieval Techniques , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[10]  M. Kendall Elementary Statistics , 1945, Nature.

[11]  Yonggang Zhang,et al.  Empowering Software Maintainers with Semantic Web Technologies , 2007, ESWC.

[12]  Ahmed E. Hassan,et al.  Examining the evolution of code comments in PostgreSQL , 2006, MSR '06.

[13]  Nicolas Anquetil,et al.  A study of the documentation essential to software maintenance , 2005, SIGDOC '05.

[14]  B. Flyvbjerg Five Misunderstandings About Case-Study Research , 2006, 1304.1186.

[15]  P. Oman,et al.  Metrics for assessing a software system's maintainability , 1992, Proceedings Conference on Software Maintenance 1992.

[16]  Carl S. Hartzman,et al.  Maintenance productivity: observations based on an experience in a large system environment , 1993, CASCON.

[17]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[18]  Yuanyuan Zhou,et al.  Listening to programmers — Taxonomies and characteristics of comments in operating system code , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[19]  William Lidwell,et al.  Universal principles of design : 100 ways to enhance usability,influence perception, increase appeal, make better, designdecisions, and teach through design , 2003 .

[20]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[21]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[22]  Harald C. Gall,et al.  Do Code and Comments Co-Evolve? On the Relation between Source Code and Comment Changes , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[23]  Ted Tenny,et al.  Procedures and comments vs. the banker's algorithm , 1985, SGCS.

[24]  Alberto Bacchelli,et al.  Content classification of development emails , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[25]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[26]  Martin P. Robillard,et al.  Patterns of Knowledge in API Reference Documentation , 2013, IEEE Transactions on Software Engineering.

[27]  Manuel J. Barranco García,et al.  Maintainability as a key factor in maintenance productivity: a case study , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[28]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[29]  Giuliano Antoniol,et al.  Information retrieval models for recovering traceability links between code and documentation , 2000, Proceedings 2000 International Conference on Software Maintenance.