Predictive Synthesis of API-Centric Code

Today’s programmers, especially data science practitioners, make heavy use of data-processing libraries (APIs) such as PyTorch, Tensorflow, NumPy, Pandas, and the like. Program synthesizers can provide significant coding assistance to this community of users; however program synthesis also can be slow due to enormous search spaces. In this work, we examine ways in which machine learning can be used to accelerate enumerative program synthesis. We present a deep-learning-based model to predict the sequence of API functions that would be needed to go from a given input to a desired output, both being numeric vectors. Our work is based on two insights. First, it is possible to learn, based on a large number of input-output examples, to predict the likely API function needed in a given situation. Second, and crucially, it is also possible to learn to compose API functions into a sequence, given an input and the desired final output, without explicitly knowing the intermediate values. We show that we can speed up an enumerative program synthesizer by using predictions from our model variants. These speedups significantly outperform previous ways (e.g. DeepCoder [1]) in which researchers have used ML models in enumerative synthesis.

[1]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Commun. ACM.

[2]  Isil Dillig,et al.  Automated Migration of Hierarchical Data to Relational Tables using Programming-by-Example , 2017, Proc. VLDB Endow..

[3]  Isil Dillig,et al.  Component-based synthesis of table consolidation and transformation tasks from examples , 2016, PLDI.

[4]  Steven Hardy,et al.  Automatic induction of LISP functions , 1974 .

[5]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[6]  Brad A. Myers,et al.  Improving API usability , 2016, Commun. ACM.

[7]  Sebastian Nowozin,et al.  DeepCoder: Learning to Write Programs , 2016, ICLR.

[8]  Sumit Gulwani,et al.  Learning Syntactic Program Transformations from Examples , 2016, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[9]  Chanchal Kumar Roy,et al.  RACK: Automatic API Recommendation Using Crowdsourced Knowledge , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[10]  Rajeev Alur,et al.  TRANSIT: specifying protocols with concolic snippets , 2013, PLDI.

[11]  David E. Shaw,et al.  Inferring LISP Programs From Examples , 1975, IJCAI.

[12]  Koushik Sen,et al.  AutoPandas: neural-backed generators for program synthesis , 2019, Proc. ACM Program. Lang..

[13]  Armando Solar-Lezama,et al.  Learning to Infer Program Sketches , 2019, ICML.

[14]  Pushmeet Kohli,et al.  RobustFill: Neural Program Learning under Noisy I/O , 2017, ICML.

[15]  Rishabh Singh,et al.  BUSTLE: Bottom-up program-Synthesis Through Learning-guided Exploration , 2020, ICLR.

[16]  Koushik Sen,et al.  Retrieval on source code: a neural code search , 2018, MAPL@PLDI.

[17]  Sumit Gulwani,et al.  Oracle-guided component-based program synthesis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[18]  Jacques Klein,et al.  FaCoY – A Code-to-Code Search Engine , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[19]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[20]  Isil Dillig,et al.  Program synthesis using conflict-driven learning , 2017, PLDI.

[21]  Rishabh Singh,et al.  Neural Program Synthesis with a Differentiable Fixer , 2020, ArXiv.

[22]  TF-Coder: Program Synthesis for Tensor Manipulations , 2020, ACM Transactions on Programming Languages and Systems.

[23]  Martin P. Robillard,et al.  What Makes APIs Hard to Learn? Answers from Developers , 2009, IEEE Software.

[24]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[25]  Lihong Li,et al.  Neuro-Symbolic Program Synthesis , 2016, ICLR.

[26]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[27]  Baishakhi Ray,et al.  On Multi-Modal Learning of Editing Source Code , 2021, 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[28]  Wei Xu,et al.  Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.

[29]  Aws Albarghouthi,et al.  MapReduce program synthesis , 2016, PLDI.

[30]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[31]  Sanjit A. Seshia,et al.  Combinatorial sketching for finite programs , 2006, ASPLOS XII.

[32]  Matthew J. Hausknecht,et al.  Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis , 2018, ICLR.

[33]  Zhenchang Xing,et al.  API Method Recommendation without Worrying about the Task-API Knowledge Gap , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[34]  NAVID YAGHMAZADEH,et al.  SQLizer: query synthesis from natural language , 2017, Proc. ACM Program. Lang..

[35]  Rishabh Singh,et al.  Synthetic Datasets for Neural Program Synthesis , 2019, ICLR.

[36]  Zohar Manna,et al.  A Deductive Approach to Program Synthesis , 1979, TOPL.

[37]  Sumit Gulwani,et al.  FlashExtract: a framework for data extraction by examples , 2014, PLDI.

[38]  Collin McMillan,et al.  Portfolio: finding relevant functions and their usage , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[39]  Koushik Sen,et al.  Aroma: code recommendation via structural code search , 2018, Proc. ACM Program. Lang..