论文信息 - LSTM-based argument recommendation for non-API methods

LSTM-based argument recommendation for non-API methods

Automatic code completion is one of the most useful features provided by advanced IDEs. Argument recommendation, as a special kind of code completion, is widely used as well. While existing approaches focus on argument recommendation for popular APIs, a large number of non-API invocations are requesting for accurate argument recommendation as well. To this end, we propose an LSTM-based approach to recommending non-API arguments instantly when method calls are typed in. With data collected from a large corpus of open-source applications, we train an LSTM neural network to recommend actual arguments based on identifiers of the invoked method, the corresponding formal parameter, and a list of syntactically correct candidate arguments. To feed these identifiers into the LSTM neural network, we convert them into fixed-length vectors by Paragraph Vector, an unsupervised neural network based learning algorithm. With the resulting LSTM neural network trained on sample applications, for a given call site we can predict which of the candidate arguments is more likely to be the correct one. We evaluate the proposed approach with tenfold validation on 85 open-source C applications. Results suggest that the proposed approach outperforms the state-of-the-art approaches in recommending non-API arguments. It improves the precision significantly from 71.46% to 83.37%.

[1] Ben Shneiderman,et al. Split menus: effectively using selection frequency to organize menus , 1994, TCHI.

[2] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[3] Chanchal Kumar Roy,et al. Exploring API method parameter recommendations , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[4] Yijun Yu,et al. Improving the Tokenisation of Identifier Names , 2011, ECOOP.

[5] Andreas Krause,et al. Predicting Program Properties from "Big Code" , 2015, POPL.

[6] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7] Swarat Chaudhuri,et al. Neural Sketch Learning for Conditional Program Generation , 2017, ICLR.

[8] Martin P. Robillard,et al. Recommendation Systems for Software Engineering , 2010, IEEE Software.

[9] Yi Zhang,et al. Automatic parameter recommendation for practical API usage , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[10] R. J. Kuo,et al. Integration of particle swarm optimization and genetic algorithm for dynamic clustering , 2012, Inf. Sci..

[11] Charles A. Sutton,et al. Suggesting accurate method and class names , 2015, ESEC/SIGSOFT FSE.

[12] Hui Liu,et al. Deep Learning Based Code Smell Detection , 2021, IEEE Transactions on Software Engineering.

[13] Martin White,et al. Toward Deep Learning Software Repositories , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[14] Charles A. Sutton,et al. Learning natural coding conventions , 2014, SIGSOFT FSE.

[15] Premkumar T. Devanbu,et al. Are deep neural networks the best choice for modeling source code? , 2017, ESEC/SIGSOFT FSE.

[16] C MurphyGail,et al. How Are Java Software Developers Using the Eclipse IDE , 2006 .

[17] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[18] Zellig S. Harris,et al. Distributional Structure , 1954 .

[19] Matthijs Douze,et al. FastText.zip: Compressing text classification models , 2016, ArXiv.

[20] Mik Kersten,et al. How are Java software developers using the Elipse IDE? , 2006, IEEE Software.

[21] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[22] Eran Yahav,et al. Code completion with statistical language models , 2014, PLDI.

[23] Ke Wang,et al. Dynamic Neural Program Embedding for Program Repair , 2017, ICLR.

[24] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[25] Mingmin Chi,et al. Long Short-Term Memory With Quadratic Connections in Recursive Neural Networks for Representing Compositional Semantics , 2017, IEEE Access.

[26] Premkumar T. Devanbu,et al. On the naturalness of software , 2016, Commun. ACM.

[27] Premkumar T. Devanbu,et al. On the localness of software , 2014, SIGSOFT FSE.

[28] Yue Luo,et al. Nomen est Omen: Exploring and Exploiting Similarities between Argument and Parameter Names , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[29] Koushik Sen,et al. DeepBugs: a learning approach to name-based bug detection , 2018, Proc. ACM Program. Lang..

[30] Hui Liu,et al. Deep Learning Based Feature Envy Detection , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[31] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.