论文信息 - From Natural Language Specifications to Program Input Parsers - 字舞流文

From Natural Language Specifications to Program Input Parsers

We present a method for automatically generating input parsers from English specifications of input file formats. We use a Bayesian generative model to capture relevant natural language phenomena and translate the English specification into a specification tree, which is then translated into a C++ input parser. We model the problem as a joint dependency parsing and semantic role labeling task. Our method is based on two sources of information: (1) the correlation between the text and the specification tree and (2) noisy supervision as determined by the success of the generated C++ parser in reading input examples. Our results show that our approach achieves 80.0% F-Score accuracy compared to an F-Score of 66.7% produced by a state-of-the-art semantic parser on a dataset of input format specifications from the ACM International Collegiate Programming Contest (which were written in English for humans with no intention of providing support for automated processing). 1

Regina Barzilay | Fan Long | Martin C. Rinard | Tao Lei

[1] Raymond J. Mooney,et al. Automated Construction of Database Interfaces: Intergrating Statistical and Relational Learning for Semantic Parsing , 2000, EMNLP.

[2] Noah A. Smith,et al. Semi-Supervised Frame-Semantic Parsing for Unknown Predicates , 2011, ACL.

[3] Regina Barzilay,et al. Using Semantic Cues to Learn Syntax , 2011, AAAI.

[4] Tao Xie,et al. Inferring method specifications from natural language API descriptions , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[5] Dan Klein,et al. Learning Dependency-Based Compositional Semantics , 2011, CL.

[6] Ming-Wei Chang,et al. Driving Semantic Parsing from the World’s Response , 2010, CoNLL.

[7] Luke S. Zettlemoyer,et al. Learning Context-Dependent Mappings from Sentences to Logical Form , 2009, ACL.

[8] Raymond J. Mooney,et al. Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[9] Ming-Wei Chang,et al. Structured Output Learning with Indirect Supervision , 2010, ICML.

[10] Dan Roth,et al. Confidence Driven Unsupervised Semantic Parsing , 2011, ACL.

[11] Luke S. Zettlemoyer,et al. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[12] Raymond J. Mooney,et al. Learning Synchronous Grammars for Semantic Parsing with Lambda Calculus , 2007, ACL.

[13] Luke S. Zettlemoyer,et al. Reading between the Lines: Learning to Map High-Level Instructions to Commands , 2010, ACL.

[14] Noah A. Smith,et al. Probabilistic Frame-Semantic Parsing , 2010, NAACL.

[15] Dan Klein,et al. Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[16] J. Baker. Trainable grammars for speech recognition , 1979 .

[17] Rohit J. Kate,et al. Learning Language Semantics from Ambiguous Supervision , 2007, AAAI.

[18] Yuanyuan Zhou,et al. /*icomment: bugs or bad comments?*/ , 2007, SOSP.

[19] Luke S. Zettlemoyer,et al. Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[20] Hoifung Poon,et al. Unsupervised Semantic Parsing , 2009, EMNLP.

[21] Thomas L. Griffiths,et al. Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[22] Yuanyuan Zhou,et al. aComment: mining annotations from comments and code to detect interrupt related concurrency bugs , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[23] Murray Hill,et al. Yacc: Yet Another Compiler-Compiler , 1978 .

[24] Tao Xie,et al. Inferring Resource Specifications from Natural Language API Documentation , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[25] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[26] Phil Blunsom,et al. Inducing Tree-Substitution Grammars , 2010, J. Mach. Learn. Res..