HISyn: human learning-inspired natural language programming

Natural Language (NL) programming automatically synthesizes code based on inputs expressed in natural language. It has recently received lots of growing interest. Recent solutions however all require many labeled training examples for their data-driven nature. This paper proposes an NLU-driven approach, a new approach inspired by how humans learn programming. It centers around Natural Language Understanding and draws on a novel graph-based mapping algorithm, foregoing the need of large numbers of labeled examples. The resulting NL programming framework, HISyn, using no training examples, gives synthesis accuracy comparable to those by data-driven methods trained on hundreds of training numbers. HISyn meanwhile demonstrates advantages in interpretability, error diagnosis support, and cross-domain extensibility.

[1]  Michael Gamon,et al.  Building Natural Language Interfaces to Web APIs , 2017, CIKM.

[2]  Junfeng Yang,et al.  AppFlow: using machine learning to synthesize robust, reusable UI tests , 2018, ESEC/SIGSOFT FSE.

[3]  Wang Ling,et al.  Latent Predictor Networks for Code Generation , 2016, ACL.

[4]  Monica S. Lam,et al.  Genie: a generator of natural language semantic parsers for virtual assistant commands , 2019, PLDI.

[5]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[6]  Premkumar T. Devanbu,et al.  A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[7]  Koushik Sen,et al.  AutoPandas: neural-backed generators for program synthesis , 2019, Proc. ACM Program. Lang..

[8]  Yu Feng,et al.  Maximal multi-layer specification synthesis , 2019, ESEC/SIGSOFT FSE.

[9]  Giuliano Antoniol,et al.  The Use of Text Retrieval and Natural Language Processing in Software Engineering , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[10]  Raymond J. Mooney,et al.  Language to Code: Learning Semantic Parsers for If-This-Then-That Recipes , 2015, ACL.

[11]  Alexander I. Rudnicky,et al.  Expanding the Scope of the ATIS Task: The ATIS-3 Corpus , 1994, HLT.

[12]  Aws Albarghouthi,et al.  Syntax-guided synthesis of Datalog programs , 2018, ESEC/SIGSOFT FSE.

[13]  Keith D. Cooper,et al.  Engineering a Compiler , 2003 .

[14]  Sanjit A. Seshia,et al.  Sketching stencils , 2007, PLDI '07.

[15]  Michael D. Ernst,et al.  NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System , 2018, LREC.

[16]  Xiaodong Gu,et al.  Deep API learning , 2016, SIGSOFT FSE.

[17]  Friedrich L. Bauer,et al.  Revised report on the algorithm language ALGOL 60 , 1963, CACM.

[18]  Weifeng Zhang,et al.  CPC: Automatically Classifying and Propagating Natural Language Comments via Program Analysis , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[19]  Dan Klein,et al.  Abstract Syntax Networks for Code Generation and Semantic Parsing , 2017, ACL.

[20]  Gabriele Bavota,et al.  Detecting missing information in bug descriptions , 2017, ESEC/SIGSOFT FSE.

[21]  Sumit Gulwani,et al.  NLyze: interactive programming by natural language for spreadsheet data analysis and manipulation , 2014, SIGMOD Conference.

[22]  Yu Zhou,et al.  Analyzing APIs Documentation and Code to Detect Directive Defects , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[23]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[24]  Hamid Krim,et al.  Egeria: a framework for automatic synthesis of HPC advising tools through multi-layered natural language processing , 2017, SC.

[25]  Dongmei Zhang,et al.  CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[26]  Collin McMillan,et al.  Portfolio: finding relevant functions and their usage , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[27]  David Lo,et al.  S3: syntax- and semantic-guided repair synthesis via programming by examples , 2017, ESEC/SIGSOFT FSE.

[28]  Fei Li,et al.  Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..

[29]  Tomoki Toda,et al.  Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[30]  Long Chen,et al.  Neural Detection of Semantic Code Clones Via Tree-Based Convolution , 2019, 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC).

[31]  Anirudh Sivaraman,et al.  A System-Wide Debugging Assistant Powered by Natural Language Processing , 2019, SoCC.

[32]  Jian Pei,et al.  MAPO: mining API usages from open source repositories , 2006, MSR '06.

[33]  Isil Dillig,et al.  Component-based synthesis for complex APIs , 2017, POPL.

[34]  Sumit Gulwani,et al.  Oracle-guided component-based program synthesis , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[35]  Sumit Gulwani,et al.  Program Synthesis Using Natural Language , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[36]  KuncakViktor,et al.  Synthesizing Java expressions from free-form queries , 2015 .

[37]  Sumit Gulwani,et al.  Synthesis from Examples: Interaction Models and Algorithms , 2012, 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.

[38]  Mukund Raghothaman,et al.  SWIM: Synthesizing What I Mean - Code Search and Idiomatic Snippet Synthesis , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[39]  Armando Solar-Lezama,et al.  Program synthesis by sketching , 2008 .

[40]  NAVID YAGHMAZADEH,et al.  SQLizer: query synthesis from natural language , 2017, Proc. ACM Program. Lang..

[41]  Sumit Gulwani,et al.  Synthesis of loop-free programs , 2011, PLDI '11.

[42]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[43]  GulwaniSumit,et al.  Synthesis of loop-free programs , 2011 .