论文信息 - Towards a Naming Quality Model

Towards a Naming Quality Model

Having highly maintainable software decreases the time spent on development. Although various research efforts show that the names of identifiers play a large role in the readability and maintainability of code, code quality assessments often do not take these names into account. Although developers can usually quickly assess the quality of a name, the abstract nature of names makes a fully automated assessment difficult. This research investigates the creation of a general naming quality model. Our proposed model assesses: a) the syntactic quality of Java method names, b) how well a method body matches its name semantically. We assess this using 1) a set of guidelines from literature, 2) a machine learning algorithm trained on AST representations of method bodies. Initial results show that the combination of a rule-based approach and a deep learning model can correctly indicate what names need attention. By inspecting the names flagged as a violation by both approaches we found that the combination of syntactic and semantic information yields better results than either of them by themselves. Further validation experiments on a Github commit dataset show that the model can distinguish between good Copyright c © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). In: Anne Etien (eds.): Proceedings of the 12th Seminar on Advanced Techniques Tools for Software Evolution, Bolzano, Italy, July 8-1

Ana-Maria Oprescu | Sander Meester | Sanne Bouwmeester

[1] Joost Visser,et al. A Practical Model for Measuring Maintainability , 2007, 6th International Conference on the Quality of Information and Communications Technology (QUATIC 2007).

[2] Premkumar T. Devanbu,et al. A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[3] Markus Pizka,et al. Concise and consistent naming , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[4] Omer Levy,et al. code2seq: Generating Sequences from Structured Representations of Code , 2018, ICLR.

[5] Charles A. Sutton,et al. Suggesting accurate method and class names , 2015, ESEC/SIGSOFT FSE.

[6] Premkumar T. Devanbu,et al. A simpler model of software readability , 2011, MSR '11.

[7] Yves Le Traon,et al. Learning to Spot and Refactor Inconsistent Method Names , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[8] David W. Binkley,et al. Quantifying identifier quality: an analysis of trends , 2006, Empirical Software Engineering.

[9] Murray Shanahan,et al. Towards Deep Symbolic Reinforcement Learning , 2016, ArXiv.

[10] Paolo Tonella,et al. Automated Identifier Completion and Replacement , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[11] Phillip Anthony Relf,et al. Achieving Software Quality through Source Code Readability , 2004 .

[12] Charles A. Sutton,et al. A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[13] Paolo Tonella,et al. Restructuring program identifier names , 2000, Proceedings 2000 International Conference on Software Maintenance.

[14] David W. Binkley,et al. Syntactic Identifier Conciseness and Consistency , 2006, 2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation.

[15] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16] Yijun Yu,et al. Exploring the Influence of Identifier Names on Code Quality: An Empirical Study , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[17] Paolo Tonella,et al. Nomen est omen: analyzing the language of function identifiers , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[18] Denys Poshyvanyk,et al. A comprehensive model for code readability , 2018, J. Softw. Evol. Process..

[19] Westley Weimer,et al. Learning a Metric for Code Readability , 2010, IEEE Transactions on Software Engineering.

[20] Andrew P. Black,et al. How We Refactor, and How We Know It , 2012, IEEE Trans. Software Eng..

[21] Einar W. Høst,et al. Debugging Method Names , 2009, ECOOP.

[22] Zhi Jin,et al. Building Program Vector Representations for Deep Learning , 2014, KSEM.

[23] Yijun Yu,et al. Relating Identifier Naming Flaws and Code Quality: An Empirical Study , 2009, 2009 16th Working Conference on Reverse Engineering.

[24] Steve McConnell,et al. Code Complete, Second Edition , 2004 .

[25] Paolo Tonella,et al. Lexicon Bad Smells in Software , 2009, 2009 16th Working Conference on Reverse Engineering.

[26] Charles A. Sutton,et al. Learning natural coding conventions , 2014, SIGSOFT FSE.

[27] José Carlos González,et al. Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization , 2011, FLAIRS.

[28] Yann-Gaël Guéhéneuc,et al. An exploratory study of identifier renamings , 2011, MSR '11.

[29] Andrea De Lucia,et al. Improving Source Code Lexicon via Traceability and Information Retrieval , 2011, IEEE Transactions on Software Engineering.

[30] Yijun Yu,et al. Mining java class naming conventions , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[31] Paolo Tonella,et al. Natural Language Parsing of Program Element Names for Concept Extraction , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[32] Robert C. Martin. Clean Code - a Handbook of Agile Software Craftsmanship , 2008 .

[33] Jonathan Dorn. A General Software Readability Model , 2012 .

[34] Uri Alon,et al. code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[35] Mario Linares Vásquez,et al. Improving code readability models with textual features , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[36] Yann-Gaël Guéhéneuc,et al. Physical and conceptual identifier dispersion: Measures and relation to fault proneness , 2010, 2010 IEEE International Conference on Software Maintenance.