Stepwise API usage assistance using n-gram language models

An IDE code completion mechanism for stepwise API exploration.A method for building a language model for supporting (1).a detailed evaluation of (2), showing how n-grams perform against existing client code.Results of (3) revealing a 90% chance of getting the expected result in the 10-top-most code completion proposals. Reusing software involves learning third-party APIs, a process that is often time-consuming and error-prone. Recommendation systems for API usage assistance based on statistical models built from source code corpora are capable of assisting API users through code completion mechanisms in IDEs. A valid sequence of API calls involving different types may be regarded as a well-formed sentence of tokens from the API vocabulary. In this article we describe an approach for recommending subsequent tokens to complete API sentences using n-gram language models built from source code corpora. The provided system was integrated in the code completion facilities of the Eclipse IDE, providing contextualized completion proposals for Java taking into account the nearest lines of code. The approach was evaluated against existing client code of four widely used APIs, revealing that in more than 90% of the cases the expected subsequent token is within the 10-top-most proposals of our models. The high score provides evidence that the recommendations could help on API learning and exploration, namely through the assistance on writing valid API sentences.

[1]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[2]  R. Holmes,et al.  Using structural context to recommend source code examples , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[3]  Mira Mezini,et al.  Ieee Transactions on Software Engineering 1 Automated Api Property Inference Techniques , 2022 .

[4]  Martin P. Robillard,et al.  Using Structure-Based Recommendations to Facilitate Discoverability in APIs , 2011, ECOOP.

[5]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[6]  Eran Yahav,et al.  Code completion with statistical language models , 2014, PLDI.

[7]  Daqing Hou,et al.  An evaluation of the strategies of sorting, filtering, and grouping API methods for Code Completion , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[8]  Mira Mezini,et al.  Learning from examples to improve code completion systems , 2009, ESEC/SIGSOFT FSE.

[9]  Martin P. Robillard,et al.  Asking and answering questions about unfamiliar APIs: An exploratory study , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[10]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Hui Zhang,et al.  Kneser-Ney Smoothing on Expected Counts , 2014, ACL.

[12]  Tommi A. Pirinen,et al.  Effect of Language and Error Models on Efficiency of Finite-State Spell-Checking and Correction , 2012, FSMNLP.

[13]  Brian Roark,et al.  Smoothed marginal distribution constraints for language modeling , 2013, ACL.

[14]  Tao Xie,et al.  Parseweb: a programmer assistant for reusing open source code on the web , 2007, ASE.

[15]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[16]  Brian Roark,et al.  Discriminative n-gram language modeling , 2007, Comput. Speech Lang..

[17]  Jian Pei,et al.  MAPO: Mining and Recommending API Usage Patterns , 2009, ECOOP.

[18]  Bertrand Meyer,et al.  An Empirical Study of API Usability , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[19]  Anh Tuan Nguyen,et al.  A statistical semantic language model for source code , 2013, ESEC/FSE 2013.

[20]  Rastislav Bodík,et al.  Jungloid mining: helping to navigate the API jungle , 2005, PLDI '05.

[21]  Martin P. Robillard,et al.  A field study of API learning obstacles , 2011, Empirical Software Engineering.

[22]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[23]  Eric Bodden,et al.  Effective API navigation and reuse , 2010, 2010 IEEE International Conference on Information Reuse & Integration.

[24]  Premkumar T. Devanbu,et al.  On the localness of software , 2014, SIGSOFT FSE.

[25]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[26]  Hermann Ney,et al.  On the Estimation of Discount Parameters for Language Model Smoothing , 2011, INTERSPEECH.

[27]  Kajal T. Claypool,et al.  XSnippet: mining For sample code , 2006, OOPSLA '06.

[28]  Zhendong Su,et al.  On the naturalness of software , 2012, ICSE 2012.

[29]  Brad A. Myers,et al.  Improving API usability , 2016, Commun. ACM.

[30]  Charles A. Sutton,et al.  Mining source code repositories at massive scale using language modeling , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[31]  Brad A. Myers,et al.  The implications of method placement on API learnability , 2008, SIGSOFT '08/FSE-16.

[32]  Martin P. Robillard,et al.  What Makes APIs Hard to Learn? Answers from Developers , 2009, IEEE Software.

[33]  Brian Ellis,et al.  The Factory Pattern in API Design: A Usability Evaluation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[34]  Brad A. Myers,et al.  Calcite: Completing Code Completion for Constructors Using Crowds , 2010, 2010 IEEE Symposium on Visual Languages and Human-Centric Computing.

[35]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.