On the relationship between comment update practices and Software Bugs

When changing source code, developers sometimes update the associated comments of the code (a consistent update), while at other times they do not (an inconsistent update). Similarly, developers sometimes only update a comment without its associated code (an inconsistent update). The relationship of such comment update practices and software bugs has never been explored empirically. While some (in)consistent updates might be harmless, software engineering folklore warns of the risks of inconsistent updates between code and comments, because these updates are likely to lead to out-of-date comments, which in turn might mislead developers and cause the introduction of bugs in the future. In this paper, we study comment update practices in three large open-source systems written in C (FreeBSD and PostgreSQL) and Java (Eclipse). We find that these practices can better explain and predict future bugs than other indicators like the number of prior bugs or changes. Our findings suggest that inconsistent changes are not necessarily correlated with more bugs. Instead, a change in which a function and its comment are suddenly updated inconsistently, whereas they are usually updated consistently (or vice versa), is risky (high probability of introducing a bug) and should be reviewed carefully by practitioners.

[1]  Ahmed E. Hassan,et al.  MapReduce as a general framework to support research in Mining Software Repositories (MSR) , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[2]  Larry E. Toothaker,et al.  Multiple Regression: Testing and Interpreting Interactions , 1991 .

[3]  Pete Goodliffe,et al.  Code Craft: The Practice of Writing Excellent Code , 2006 .

[4]  David Lorge Parnas,et al.  Software aging , 1994, Proceedings of 16th International Conference on Software Engineering.

[5]  Dirk Riehle,et al.  The commenting practice of open source , 2009, OOPSLA Companion.

[6]  Jesús M. González-Barahona,et al.  Towards a Theoretical Model for Software Growth , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[7]  Massimiliano Di Penta,et al.  An experimental investigation of formality in UML-based development , 2005, IEEE Transactions on Software Engineering.

[8]  Taghi M. Khoshgoftaar,et al.  Data Mining for Predictors of Software Quality , 1999, Int. J. Softw. Eng. Knowl. Eng..

[9]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[10]  C. Coulton,et al.  Interaction Effects in Multiple Regression , 1993 .

[11]  Dewayne E. Perry,et al.  Classification and evaluation of defects in a project retrospective , 2002, J. Syst. Softw..

[12]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[13]  Stephen H. Caine,et al.  PDL: a tool for software design , 1975, AFIPS '75.

[14]  Nicolas Anquetil,et al.  A study of the documentation essential to software maintenance , 2005, SIGDOC '05.

[15]  A. Hassan,et al.  C-REX : An Evolutionary Code Extractor for C , 2004 .

[16]  Fred P. Brooks,et al.  The Mythical Man-Month , 1975, Reliable Software.

[17]  Scott N. Woodfield,et al.  The effect of modularization and comments on program comprehension , 1981, ICSE '81.

[18]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[19]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[20]  P. Kidwell,et al.  The mythical man-month: Essays on software engineering , 1996, IEEE Annals of the History of Computing.

[21]  Les Hatton,et al.  Reexamining the Fault Density-Component Size Connection , 1997, IEEE Softw..

[22]  Audris Mockus,et al.  Software Dependencies, Work Dependencies, and Their Impact on Failures , 2009, IEEE Transactions on Software Engineering.

[23]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation0 , 1984, CACM.

[24]  Richard C. Holt,et al.  Source versus object code extraction for recovering software architecture , 2005, 12th Working Conference on Reverse Engineering (WCRE'05).

[25]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[26]  Michael E. Atwood,et al.  Flowcharts versus program design languages: an experimental comparison , 1983, CACM.

[27]  Harvey P. Siy,et al.  Does the modern code inspection have value? , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[28]  Ahmed E. Hassan,et al.  Automated classification of change messages in open source projects , 2008, SAC '08.

[29]  Michael W. Godfrey,et al.  Evolution in open source software: a case study , 2000, Proceedings 2000 International Conference on Software Maintenance.

[30]  Siw Elisabeth Hove,et al.  The impact of UML documentation on software maintenance: an experimental evaluation , 2006, IEEE Transactions on Software Engineering.

[31]  Frederick P. Brooks,et al.  The Mythical Man-Month: Essays on Softw , 1978 .

[32]  Yuanyuan Zhou,et al.  /*icomment: bugs or bad comments?*/ , 2007, SOSP.

[33]  Audris Mockus,et al.  Identifying reasons for software changes using historic databases , 2000, Proceedings 2000 International Conference on Software Maintenance.

[34]  E. Nurvitadhi,et al.  Do class comments aid Java program understanding? , 2003, 33rd Annual Frontiers in Education, 2003. FIE 2003..

[35]  Steve McConnell,et al.  Code Complete, Second Edition , 2004 .

[36]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[37]  Lionel C. Briand,et al.  Predicting fault-prone components in a java legacy system , 2006, ISESE '06.

[38]  Leon Moonen,et al.  Generating robust parsers using island grammars , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[39]  Juan Fernández-Ramil,et al.  The evolution of Eclipse , 2008, 2008 IEEE International Conference on Software Maintenance.

[40]  Andrew M. Kuhn,et al.  Code Complete , 2005, Technometrics.

[41]  Ahmed E. Hassan,et al.  Understanding the rationale for updating a function’s comment , 2008, 2008 IEEE International Conference on Software Maintenance.

[42]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[43]  James L. Wright,et al.  Source code that talks: an exploration of Eclipse task comments and their implication to repository mining , 2005, ACM SIGSOFT Softw. Eng. Notes.

[44]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation , 1993 .

[45]  Tze-Jie Yu,et al.  An Analysis of Several Software Defect Models , 1988, IEEE Trans. Software Eng..

[46]  Malik Beshir Malik,et al.  Applied Linear Regression , 2005, Technometrics.

[47]  Harald C. Gall,et al.  Do Code and Comments Co-Evolve? On the Relation between Source Code and Comment Changes , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[48]  Marius Sundbakken,et al.  Assessing the maintainability of C++ source code , 2001 .