Learning a Metric for Code Readability

In this paper, we explore the concept of code readability and investigate its relation to software quality. With data collected from 120 human annotators, we derive associations between a simple set of local code features and human notions of readability. Using those features, we construct an automated readability measure and show that it can be 80 percent effective and better than a human, on average, at predicting readability judgments. Furthermore, we show that this metric correlates strongly with three measures of software quality: code changes, automated defect reports, and defect log messages. We measure these correlations on over 2.2 million lines of code, as well as longitudinally, over many releases of selected projects. Finally, we discuss the implications of this study on programming language design and engineering practice. For example, our data suggest that comments, in and of themselves, are less important than simple blank lines to local judgments of readability.

[1]  Ted Tenny,et al.  Program Readability: Procedures Versus Comments , 1988, IEEE Trans. Software Eng..

[2]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[3]  Nuzhat J. Haneef Software documentation and readability: a proposed process improvement , 1998, SOEN.

[4]  Guido Rossum,et al.  Internet Programming With Python , 1996 .

[5]  Edsger W. Dijkstra,et al.  A Discipline of Programming , 1976 .

[6]  David Hovemeyer,et al.  Finding bugs is easy , 2004, SIGP.

[7]  J. Peter Kincaid,et al.  Derivation and Validation of the Automated Readability Index for Use with Technical Materials , 1970 .

[8]  R. Gunning The Technique of Clear Writing. , 1968 .

[9]  Lionel E. Deimel The uses of program reading , 1985, SGCS.

[10]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[11]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[12]  Raymond P. L. Buse,et al.  A metric for software readability , 2008, ISSTA '08.

[13]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[14]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[15]  E. P. Schan,et al.  Recommended C Style and Coding Standards , 1997 .

[16]  K. K. Aggarwal,et al.  An integrated measure of software maintainability , 2002, Annual Reliability and Maintainability Symposium. 2002 Proceedings (Cat. No.02CH37318).

[17]  Scott W. Ambler Java coding standards , 1997 .

[18]  Darrell R. Raymond,et al.  Reading source code , 1991, CASCON.

[19]  Ben Shneiderman,et al.  Program indentation and comprehensibility , 1983, CACM.

[20]  Spencer Rugaber,et al.  The use of domain knowledge in program understanding , 2000, Ann. Softw. Eng..

[21]  Ben Shneiderman,et al.  Ordered and quantum treemaps: Making effective use of 2D space to display hierarchies , 2002, TOGS.

[22]  Herb Sutter,et al.  C++ Coding Standards: 101 Rules, Guidelines, and Best Practices (C++ in Depth Series) , 2004 .

[23]  Barry W. Boehm,et al.  Software Defect Reduction Top 10 List , 2001, Computer.

[24]  Claire Le Goues,et al.  Specification Mining with Few False Positives , 2009, TACAS.

[25]  Dimitris Christodoulakis,et al.  Measuring the readability and maintainability of hyperdocuments , 1995, J. Softw. Maintenance Res. Pract..

[26]  Steven E. Stemler Practical Assessment, Research, and Evaluation Practical Assessment, Research, and Evaluation A Comparison of Consensus, Consistency, and Measurement A Comparison of Consensus, Consistency, and Measurement Approaches to Estimating Interrater Reliability Approaches to Estimating Interrater Reliabilit , 2022 .

[27]  Frederick P. Brooks,et al.  No Silver Bullet: Essence and Accidents of Software Engineering , 1987 .

[28]  John C. Knight,et al.  Phased inspections and their implementation , 1991, SOEN.

[29]  Barry Boehm,et al.  Top 10 list [software development] , 2001 .

[30]  Phillip A. Relf,et al.  Tool assisted identifier naming for improved software readability: an empirical study , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[31]  Michael Marcotty,et al.  Improving computer program readability to aid modification , 1982, CACM.

[32]  Lawrence L. Giventer Statistical Analysis for Public Administration , 1995 .

[33]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[34]  Elaine J. Weyuker,et al.  Evaluating Software Complexity Measures , 2010, IEEE Trans. Software Eng..

[35]  Guy L. Steele,et al.  The Java Language Specification , 1996 .

[36]  Tsong Yueh Chen,et al.  On the statistical properties of the F-measure , 2004, Fourth International Conference onQuality Software, 2004. QSIC 2004. Proceedings..