Pythia: AI-assisted Code Completion System

In this paper, we propose a novel end-to-end approach for AI-assisted code completion called Pythia. It generates ranked lists of method and API recommendations which can be used by software developers at edit time. The system is currently deployed as part of Intellicode extension in Visual Studio Code IDE. Pythia exploits state-of-the-art large-scale deep learning models trained on code contexts extracted from abstract syntax trees. It is designed to work at a high throughput predicting the best matching code completions on the order of 100 ms. We describe the architecture of the system, perform comparisons to frequency-based approach and invocation-based Markov Chain language model, and discuss challenges serving Pythia models on lightweight client devices. The offline evaluation results obtained on 2700 Python open source software GitHub repositories show a top-5 accuracy of 92%, surpassing the baseline models by 20% averaged over classes, for both intra and cross-project settings.

[1]  Hakan Inan,et al.  Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[2]  Mik Kersten,et al.  How are Java software developers using the Elipse IDE? , 2006, IEEE Software.

[3]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[4]  Ruzica Piskac,et al.  Complete completion using types and weights , 2013, PLDI.

[5]  Tao Luo,et al.  Using sequential and non-sequential patterns in predictive Web usage mining tasks , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[6]  Eric Bodden,et al.  Effective API navigation and reuse , 2010, 2010 IEEE International Conference on Information Reuse & Integration.

[7]  David Maxwell Chickering,et al.  Using Temporal Data for Making Recommendations , 2001, UAI.

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[10]  Harris Wu,et al.  Evaluating Web-based Question Answering Systems , 2002, LREC.

[11]  Alan F. Blackwell,et al.  An empirical investigation of code completion usage by professional software developers , 2015, PPIG.

[12]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[13]  Daniel P. W. Ellis,et al.  Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems , 2015, ArXiv.

[14]  Mira Mezini,et al.  Intelligent Code Completion with Bayesian Networks , 2015, ACM Trans. Softw. Eng. Methodol..

[15]  Cristina V. Lopes,et al.  Collective Intelligence for Smarter API Recommendations in Python , 2016, 2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[16]  Mira Mezini,et al.  Learning from examples to improve code completion systems , 2009, ESEC/SIGSOFT FSE.

[17]  Wei Xu,et al.  Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.

[18]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[19]  Mik Kersten,et al.  How are lava software developers using the eclipse IDE , 2006 .

[20]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[21]  Eran Yahav,et al.  Code completion with statistical language models , 2014, PLDI.

[22]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..