Towards a Naming Quality Model

Having highly maintainable software decreases the time spent on development. Although various research efforts show that the names of identifiers play a large role in the readability and maintainability of code, code quality assessments often do not take these names into account. Although developers can usually quickly assess the quality of a name, the abstract nature of names makes a fully automated assessment difficult. This research investigates the creation of a general naming quality model. Our proposed model assesses: a) the syntactic quality of Java method names, b) how well a method body matches its name semantically. We assess this using 1) a set of guidelines from literature, 2) a machine learning algorithm trained on AST representations of method bodies. Initial results show that the combination of a rule-based approach and a deep learning model can correctly indicate what names need attention. By inspecting the names flagged as a violation by both approaches we found that the combination of syntactic and semantic information yields better results than either of them by themselves. Further validation experiments on a Github commit dataset show that the model can distinguish between good Copyright c © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). In: Anne Etien (eds.): Proceedings of the 12th Seminar on Advanced Techniques Tools for Software Evolution, Bolzano, Italy, July 8-1

[1]  Joost Visser,et al.  A Practical Model for Measuring Maintainability , 2007, 6th International Conference on the Quality of Information and Communications Technology (QUATIC 2007).

[2]  Premkumar T. Devanbu,et al.  A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[3]  Markus Pizka,et al.  Concise and consistent naming , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[4]  Omer Levy,et al.  code2seq: Generating Sequences from Structured Representations of Code , 2018, ICLR.

[5]  Charles A. Sutton,et al.  Suggesting accurate method and class names , 2015, ESEC/SIGSOFT FSE.

[6]  Premkumar T. Devanbu,et al.  A simpler model of software readability , 2011, MSR '11.

[7]  Yves Le Traon,et al.  Learning to Spot and Refactor Inconsistent Method Names , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[8]  David W. Binkley,et al.  Quantifying identifier quality: an analysis of trends , 2006, Empirical Software Engineering.

[9]  Murray Shanahan,et al.  Towards Deep Symbolic Reinforcement Learning , 2016, ArXiv.

[10]  Paolo Tonella,et al.  Automated Identifier Completion and Replacement , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[11]  Phillip Anthony Relf,et al.  Achieving Software Quality through Source Code Readability , 2004 .

[12]  Charles A. Sutton,et al.  A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[13]  Paolo Tonella,et al.  Restructuring program identifier names , 2000, Proceedings 2000 International Conference on Software Maintenance.

[14]  David W. Binkley,et al.  Syntactic Identifier Conciseness and Consistency , 2006, 2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation.

[15]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16]  Yijun Yu,et al.  Exploring the Influence of Identifier Names on Code Quality: An Empirical Study , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[17]  Paolo Tonella,et al.  Nomen est omen: analyzing the language of function identifiers , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[18]  Denys Poshyvanyk,et al.  A comprehensive model for code readability , 2018, J. Softw. Evol. Process..

[19]  Westley Weimer,et al.  Learning a Metric for Code Readability , 2010, IEEE Transactions on Software Engineering.

[20]  Andrew P. Black,et al.  How We Refactor, and How We Know It , 2012, IEEE Trans. Software Eng..

[21]  Einar W. Høst,et al.  Debugging Method Names , 2009, ECOOP.

[22]  Zhi Jin,et al.  Building Program Vector Representations for Deep Learning , 2014, KSEM.

[23]  Yijun Yu,et al.  Relating Identifier Naming Flaws and Code Quality: An Empirical Study , 2009, 2009 16th Working Conference on Reverse Engineering.

[24]  Steve McConnell,et al.  Code Complete, Second Edition , 2004 .

[25]  Paolo Tonella,et al.  Lexicon Bad Smells in Software , 2009, 2009 16th Working Conference on Reverse Engineering.

[26]  Charles A. Sutton,et al.  Learning natural coding conventions , 2014, SIGSOFT FSE.

[27]  José Carlos González,et al.  Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization , 2011, FLAIRS.

[28]  Yann-Gaël Guéhéneuc,et al.  An exploratory study of identifier renamings , 2011, MSR '11.

[29]  Andrea De Lucia,et al.  Improving Source Code Lexicon via Traceability and Information Retrieval , 2011, IEEE Transactions on Software Engineering.

[30]  Yijun Yu,et al.  Mining java class naming conventions , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[31]  Paolo Tonella,et al.  Natural Language Parsing of Program Element Names for Concept Extraction , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[32]  Robert C. Martin Clean Code - a Handbook of Agile Software Craftsmanship , 2008 .

[33]  Jonathan Dorn A General Software Readability Model , 2012 .

[34]  Uri Alon,et al.  code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[35]  Mario Linares Vásquez,et al.  Improving code readability models with textual features , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[36]  Yann-Gaël Guéhéneuc,et al.  Physical and conceptual identifier dispersion: Measures and relation to fault proneness , 2010, 2010 IEEE International Conference on Software Maintenance.