A Naming Pattern Based Approach for Method Name Recommendation

Method names in software projects are significant for developers to understand the method functionality. Existing state-of-the-art automated approaches tend to explore tokens composing method names from method contexts. However, the method name is not a simple combination of tokens, as it is structured and contains many repetitive naming patterns (e.g. “get __”, “create __”). Through a large-scale empirical analysis on 15M methods from 14K real software projects developed with Java codes, we found repetitive naming patterns in method names. In addition, the names of two function-similar methods usually have the same naming pattern. Based on our empirical study, we propose a naming pattern-based approach for method name recommendation, named Nam-Pat. Specifically, for a target method, NamPat first retrieve the most similar method from the training data by estimating their body code similarity. Then, the name of the most similar method is used as the pattern guider to provide the naming pattern, and NamPat combines it with the context information of the target method to perform method name recommendation. To verify the effectiveness of the proposed approach, we conducted experiments on 17M methods from a widely used Java dataset. Experimental results show that compared with Code2vec, Code2seq, MNire, and Cognac, NamPat improves the state-of-the-art approaches in precision (5.8%-27.1%), recall (11.1%-60.1 %), and F-score (8.5 %-43.9%), which proves the effectiveness of our proposed approach.

[1]  Zhi Jin,et al.  EditSum: A Retrieve-and-Edit Framework for Source Code Summarization , 2021, 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[2]  Xiaoguang Mao,et al.  Lightweight global and local contexts guided method name recommendation with prior knowledge , 2021, ESEC/SIGSOFT FSE.

[3]  Tien N. Nguyen,et al.  A Context-Based Automated Approach for Method Name Consistency Checking and Suggestion , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[4]  Son Nguyen,et al.  Suggesting Natural Method Names to Check Name Consistencies , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[5]  Hailong Sun,et al.  Retrieval-based Neural Source Code Summarization , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[6]  Xiaofei Xie,et al.  Automatic Code Summarization via Multi-dimensional Semantic Fusing in GNN , 2020, ArXiv.

[7]  Baishakhi Ray,et al.  A Transformer-based Approach for Source Code Summarization , 2020, ACL.

[8]  Kevin A. Schneider,et al.  CLCDSA: Cross Language Code Clone Detection using Syntactical Features and API Documentation , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[9]  Xin Xia,et al.  Code Generation as a Dual Task of Code Summarization , 2019, NeurIPS.

[10]  Yves Le Traon,et al.  Learning to Spot and Refactor Inconsistent Method Names , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[11]  Hailong Sun,et al.  A Novel Neural Source Code Representation Based on Abstract Syntax Tree , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[12]  Marc Brockschmidt,et al.  Structured Neural Summarization , 2018, ICLR.

[13]  Omer Levy,et al.  code2seq: Generating Sequences from Structured Representations of Code , 2018, ICLR.

[14]  Shahzad Qaiser,et al.  Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents , 2018, International Journal of Computer Applications.

[15]  Venera Arnaoudova,et al.  The Effect of Poor Source Code Lexicon and Readability on Developers' Cognitive Load , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[16]  Michael Beigl,et al.  Descriptive Compound Identifier Names Improve Source Code Comprehension , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[17]  David Lo,et al.  Deep Code Comment Generation , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[18]  Omer Levy,et al.  code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[19]  Uri Alon,et al.  A general path-based representation for predicting program properties , 2018, PLDI.

[20]  Mira Mezini,et al.  A Systematic Evaluation of Static API-Misuse Detectors , 2017, IEEE Transactions on Software Engineering.

[21]  Marc Brockschmidt,et al.  Learning to Represent Programs with Graphs , 2017, ICLR.

[22]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[23]  Alvin Cheung,et al.  Summarizing Source Code using a Neural Attention Model , 2016, ACL.

[24]  Martin T. Vechev,et al.  PHOG: Probabilistic Model for Code , 2016, ICML.

[25]  Charles A. Sutton,et al.  A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[26]  Charles A. Sutton,et al.  Suggesting accurate method and class names , 2015, ESEC/SIGSOFT FSE.

[27]  Andreas Krause,et al.  Predicting Program Properties from "Big Code" , 2015, POPL.

[28]  Paolo Tonella,et al.  The Effect of Lexicon Bad Smells on Concept Location in Source Code , 2011, 2011 IEEE 11th International Working Conference on Source Code Analysis and Manipulation.

[29]  Yijun Yu,et al.  Relating Identifier Naming Flaws and Code Quality: An Empirical Study , 2009, 2009 16th Working Conference on Reverse Engineering.

[30]  Einar W. Høst,et al.  Debugging Method Names , 2009, ECOOP.

[31]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[32]  Otis Gospodnetic,et al.  Lucene in Action (In Action series) , 2004 .

[33]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[34]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.