Boosting component-based synthesis with control structure recommendation

Component-based synthesis is an important research field in program synthesis. API-based synthesis is a subfield of component-based synthesis, the component library of which are Java APIs. Unlike existing work in API-based synthesis that can only generate loop-free programs constituted by APIs, state-of-the-art work FrAngel can generate programs with control structures. However, for the generation of control structures, it samples different types of control structures all at random. Given the information about the desired method (such as method name and input/output types), experienced programmers can have an initial thought about the possible control structures that could be used in implementing the desired method. The knowledge about control structures in the method can be learned from high-quality projects. In this paper, we propose a novel approach of recommending control structures for API-based synthesis based on deep learning. A neural network that can jointly embed the natural language description, method name, and input/output types into high-dimensional vectors to predict the possible control structures of the desired method is proposed. We integrate the prediction model into the synthesizer to improve the efficiency of synthesis. We train our model on a codebase of high-quality Java projects from GitHub. The prediction results of the neural network are fed to the API-based synthesizer to guide the sampling process of control structures. The experimental results on 40 programming tasks show that our approach can effectively improve the efficiency of synthesis.

[1]  Sumit Gulwani,et al.  Synthesis of loop-free programs , 2011, PLDI '11.

[2]  Sumit Gulwani,et al.  Learning Semantic String Transformations from Examples , 2012, Proc. VLDB Endow..

[3]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[4]  Martin White,et al.  Toward Deep Learning Software Repositories , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[5]  Charles A. Sutton,et al.  Mining source code repositories at massive scale using language modeling , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[6]  Pushmeet Kohli,et al.  RobustFill: Neural Program Learning under Noisy I/O , 2017, ICML.

[7]  Sanjit A. Seshia,et al.  Sketching stencils , 2007, PLDI '07.

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Isil Dillig,et al.  Synthesizing data structure transformations from input-output examples , 2015, PLDI.

[10]  Isil Dillig,et al.  Component-based synthesis of table consolidation and transformation tasks from examples , 2016, PLDI.

[11]  Charles A. Sutton,et al.  A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[12]  Anh Tuan Nguyen,et al.  Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports (N) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[13]  Trong Duc Nguyen,et al.  Exploring API Embedding for API Usages and Applications , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[14]  C. Cordell Green,et al.  What Is Program Synthesis? , 1985, J. Autom. Reason..

[15]  Sumit Gulwani,et al.  Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples , 2018, ICLR.

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  Eran Yahav,et al.  Code completion with statistical language models , 2014, PLDI.

[18]  Rajeev Alur,et al.  Accelerating search-based program synthesis using learned probabilistic models , 2018, PLDI.

[19]  Sumit Gulwani,et al.  Programming by Examples: PL Meets ML , 2017, APLAS.

[20]  Sumit Gulwani,et al.  Test-driven synthesis , 2014, PLDI.

[21]  Dan Klein,et al.  Abstract Syntax Networks for Code Generation and Semantic Parsing , 2017, ACL.

[22]  Sumit Gulwani,et al.  Synthesizing geometry constructions , 2011, PLDI '11.

[23]  Sumit Gulwani,et al.  Spreadsheet data manipulation using examples , 2012, CACM.

[24]  Lili Mou,et al.  A Grammar-Based Structural CNN Decoder for Code Generation , 2018, AAAI.

[25]  Sebastian Nowozin,et al.  DeepCoder: Learning to Write Programs , 2016, ICLR.

[26]  Lihong Li,et al.  Neuro-Symbolic Program Synthesis , 2016, ICLR.

[27]  Swarat Chaudhuri,et al.  Neural Sketch Learning for Conditional Program Generation , 2017, ICLR.

[28]  Percy Liang,et al.  FrAngel: component-based synthesis with control structures , 2018, Proc. ACM Program. Lang..

[29]  Wang Ling,et al.  Latent Predictor Networks for Code Generation , 2016, ACL.

[30]  Sanjit A. Seshia,et al.  Combinatorial sketching for finite programs , 2006, ASPLOS XII.

[31]  Xiaodong Gu,et al.  Deep API learning , 2016, SIGSOFT FSE.

[32]  Sumit Gulwani,et al.  Synthesizing Number Transformations from Input-Output Examples , 2012, CAV.

[33]  Armando Solar-Lezama,et al.  Programming by sketching for bit-streaming programs , 2005, PLDI '05.

[34]  Xiaodong Gu,et al.  Deep Code Search , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[35]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[36]  Isil Dillig,et al.  Component-based synthesis for complex APIs , 2017, POPL.

[37]  Sumit Gulwani,et al.  Oracle-guided component-based program synthesis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[38]  Sumit Gulwani,et al.  Recursive Program Synthesis , 2013, CAV.