Learning for semantic parsing and natural language generation using statistical machine translation techniques

One of the main goals of natural language processing (NLP) is to build automated systems that can understand and generate human lanugages. This goal has so far remained elusive. Existing hand-crafted systems can provide in-depth analysis of domain sub-languages, but are often notoriously fragile and costly to build. Existing machine-learned systems are considerably more robust, but are limited to relatively shallow NLP tasks. In this thesis, we present novel statistical methods for robust natural language understanding and generation. We focus on two important sub-tasks, semantic parsing and tactical generation. The key idea is that both tasks can be treated as the translation between natural languages and formal meaning representation languages, and therefore, can be performed using state-of-the-art statistical machine translation techniques. Specifically, we use a technique called synchronous parsing, which has been extensively used in syntax-based machine translation, as the unifying framework for semantic parsing and tactical generation. The parsing and generation algorithms learn all of their linguistic knowledge from annotated corpora, and can handle natural-language sentences that are conceptually complex. A nice feature of our algorithms is that the semantic parsers and tactical generators share the same learned synchronous grammars. Moreover, charts are used as the unifying language-processing architecture for efficient parsing and generation. Therefore, the generators are said to be the inverse of the parsers, an elegant property that has been widely advocated. Furthermore, we show that our parsers and generators can handle formal meaning representation languages containing logical variables, including predicate logic. Our basic semantic parsing algorithm is called WASP. Most of the other parsing and generation algorithms presented in this thesis are extensions of WASP or its inverse. We demonstrate the effectiveness of our parsing and generation algorithms by performing experiments in two real-world, restricted domains. Experimental results show that our algorithms are more robust and accurate than the currently best systems that require similar supervision. Our work is also the first attempt to use the same automatically-learned grammar for both parsing and generation. Unlike previous systems that require manually-constructed grammars and lexicons, our systems require much less knowledge engineering and can be easily ported to other languages and domains.

[1]  Srinivas Bangalore,et al.  Evaluation Metrics for Generation , 2000, INLG.

[2]  Gökhan Tür,et al.  An English-to-Turkish Interlingual MT System , 1998, AMTA.

[3]  Kathleen McKeown,et al.  Lexicalized Markov Grammars for Sentence Compression , 2007, NAACL.

[4]  Fei Xia,et al.  Multilingual Structural Projection across Interlinear Text , 2007, HLT-NAACL.

[5]  Heidi Fox,et al.  Phrasal Cohesion and Statistical Machine Translation , 2002, EMNLP.

[6]  Raymond J. Mooney,et al.  Generation by Inverting a Semantic Parser that Uses Statistical Machine Translation , 2007, NAACL.

[7]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[8]  John DeNero,et al.  Tailoring Word Alignments to Syntactic Machine Translation , 2007, ACL.

[9]  Philipp Koehn,et al.  Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[10]  Patrick Pantel,et al.  The Omega Ontology , 2005, IJCNLP.

[11]  Aravind K. Joshi,et al.  An Earley-Type Parsing Algorithm for Tree Adjoining Grammars , 1988, ACL.

[12]  Steffen Staab,et al.  Project Halo: Towards a Digital Aristotle , 2004, AI Mag..

[13]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[14]  Raymond J. Mooney,et al.  Using Multiple Clause Constructors in Inductive Logic Programming for Semantic Parsing , 2001, ECML.

[15]  Mirella Lapata,et al.  Using Subcategorization to Resolve Verb Class Ambiguity , 1999, EMNLP.

[16]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[17]  Colin Cherry,et al.  Soft Syntactic Constraints for Word Alignment through Discriminative Training , 2006, ACL.

[18]  Salim Roukos,et al.  Feature-based language understanding , 1997, EUROSPEECH.

[19]  Hwee Tou Ng,et al.  An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation , 2002, EMNLP.

[20]  Raymond J. Mooney,et al.  Discriminative Reranking for Semantic Parsing , 2006, ACL.

[21]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[22]  Jason Baldridge,et al.  Part-of-Speech Tagging for Middle English through Alignment and Projection of Parallel Diachronic Texts , 2007, EMNLP-CoNLL.

[23]  Stuart M. Shieber,et al.  Probabilistic Synchronous Tree-Adjoining Grammars for Machine Translation: The Argument from Bilingual Dictionaries , 2007, SSST@HLT-NAACL.

[24]  Alonzo Church,et al.  A formulation of the simple theory of types , 1940, Journal of Symbolic Logic.

[25]  James R. Curran,et al.  Log-Linear Models for Wide-Coverage CCG Parsing , 2003, EMNLP.

[26]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[27]  Stuart M. Shieber,et al.  Generation and Synchronous Tree-Adjoining Grammars , 1991, INLG.

[28]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[29]  Robert Moore,et al.  A Complete, Efficient Sentence-Realization Algorithm for Unification Grammar , 2002, INLG.

[30]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[31]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[32]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[33]  Paul S. Jacobs,et al.  PHRED: A Generator for Natural Language Interfaces , 1985, Comput. Linguistics.

[34]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[35]  William Schuler,et al.  Using Model-Theoretic Semantic Interpretation to Guide Statistical Parsing and Word Recognition in a Spoken Language Interface , 2003, ACL.

[36]  Alfred V. Aho,et al.  Properties of Syntax Directed Translations , 1969, J. Comput. Syst. Sci..

[37]  Mark Przybocki,et al.  NIST 2005 machine translation evaluation official results , 2005 .

[38]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[39]  Michael White,et al.  Reining in CCG Chart Realization , 2004, INLG.

[40]  Stanley F. Chen,et al.  A Gaussian Prior for Smoothing Maximum Entropy Models , 1999 .

[41]  Stephan Oepen,et al.  High Efficiency Realization for a Wide-Coverage Unification Grammar , 2005, IJCNLP.

[42]  Hermann Ney,et al.  Natural language understanding using statistical machine translation , 2001, INTERSPEECH.

[43]  Daniel Marcu,et al.  Scalable Inference and Training of Context-Rich Syntactic Translation Models , 2006, ACL.

[44]  Raymond J. Mooney,et al.  A Statistical Semantic Parser that Integrates Syntax and Semantics , 2005, CoNLL.

[45]  Luke S. Zettlemoyer,et al.  Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.

[46]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[47]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[48]  Christopher R. Johnson,et al.  Background to Framenet , 2003 .

[49]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[50]  David R. Dowty,et al.  Introduction to Montague semantics , 1980 .

[51]  Dilek Z. Hakkani-Tür,et al.  Spoken language understanding , 2008, IEEE Signal Processing Magazine.

[52]  John D. Burger,et al.  The MITRE logical form generation system , 2004, SENSEVAL@ACL.

[53]  Nizar Habash,et al.  Interlingual Annotation of Multilingual Text Corpora , 2004, FCP@NAACL-HLT.

[54]  Roger Levy,et al.  A Generative Model for Semantic Role Labeling , 2003, ECML.

[55]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[56]  Katrin Erk,et al.  HALMANESER – A Toolchain For Shallow Semantic Parsing , 2006 .

[57]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[58]  Vasileios Hatzivassiloglou,et al.  Two-Level, Many-Paths Generation , 1995, ACL.

[59]  Johan Bos Towards Wide-Coverage Semantic Interpretation , 2005 .

[60]  Alexander M. Fraser,et al.  Improved Machine Translation Performance via Parallel Sentence Extraction from Comparable Corpora , 2004, NAACL.

[61]  Kevin Knight,et al.  Preserving Ambiguities in Generation via Automata Intersection , 2000, AAAI/IAAI.

[62]  Michael Elhadad,et al.  An Overview of SURGE: a Reusable Comprehensive Syntactic Realization Component , 1996, INLG.

[63]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[64]  Steve J. Young,et al.  Hidden vector state model for hierarchical semantic parsing , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[65]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[66]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[67]  Richard M. Schwartz,et al.  A Fully Statistical Approach to Natural Language Interfaces , 1996, ACL.

[68]  Siobhan Chapman Logic and Conversation , 2005 .

[69]  Irene Langkilde-Geary,et al.  An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator , 2002, INLG.

[70]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[71]  Dekai Wu,et al.  Recognizing Paraphrases and Textual Entailment Using Inversion Transduction Grammars , 2005, EMSEE@ACL.

[72]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[73]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[74]  Daniel Marcu,et al.  Stochastic Language Generation Using WIDL-Expressions and its Application in Machine Translation and Summarization , 2006, ACL.

[75]  Raymond J. Mooney,et al.  Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.

[76]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[77]  XTAG Research Group,et al.  A Lexicalized Tree Adjoining Grammar for English , 1998, ArXiv.

[78]  Geoffrey K. Pullum,et al.  Generalized Phrase Structure Grammar , 1985 .

[79]  Bowen Zhou,et al.  IBM MASTOR SYSTEM: Multilingual Automatic Speech-to-Speech Translator , 2006 .

[80]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[81]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[82]  Alexander I. Rudnicky,et al.  Stochastic Language Generation for Spoken Dialogue Systems , 2000 .

[83]  David H. D. Warren,et al.  An Efficient Easily Adaptable System for Interpreting Natural Language Queries , 1982, CL.

[84]  Steve J. Young,et al.  Spoken language understanding using the Hidden Vector State Model , 2006, Speech Commun..

[85]  Alexander H. Waibel,et al.  Effective Phrase Translation Extraction from Alignment Models , 2003, ACL.

[86]  Martin Kay Syntactic processing and functional sentence perspective , 1975, TINLAP '75.

[87]  David A. Smith,et al.  Quasi-Synchronous Grammars: Alignment by Soft Projection of Syntactic Dependencies , 2006, WMT@HLT-NAACL.

[88]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[89]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[90]  Stuart M. Shieber,et al.  A Uniform Architecture for Parsing and Generation , 1988, COLING.

[91]  Jerry R. Hobbs,et al.  Learning by Reading: A Prototype System, Performance Baseline and Lessons Learned , 2007, AAAI.

[92]  Oren Etzioni,et al.  Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability , 2004, COLING.

[93]  Benoit Lavoie,et al.  A Fast and Portable Realizer for Text Generation Systems , 1997, ANLP.

[94]  Anna Maria Di Sciullo,et al.  Natural Language Understanding , 2009, SoMeT.

[95]  James R. Curran,et al.  Parsing the WSJ Using CCG and Log-Linear Models , 2004, ACL.

[96]  Renato De Mori,et al.  The Application of Semantic Classification Trees to Natural Language Understanding , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[97]  Wolfgang Wahlster,et al.  Over-Answering Yes-No Questions: Extended Responses in a NL Interface to a Vision System , 1983, IJCAI.

[98]  Nizar Habash,et al.  Parsing Arabic Dialects , 2006, EACL.

[99]  Matthew Haines,et al.  Filling Knowledge Gaps in a Broad-Coverage Machine Translation System , 1995, IJCAI.

[100]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[101]  Teruko Mitamura,et al.  The KANT System: Fast, Accurate, High-Quality Translation in Practical Domains , 1992, COLING.

[102]  Yorick Wilks,et al.  An artificial intelligence approach to machine translation. , 1972 .

[103]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[104]  Rohit J. Kate,et al.  Learning to Transform Natural to Formal Languages , 2005, AAAI.

[105]  Daniel M. Bikel,et al.  Intricacies of Collins’ Parsing Model , 2004, CL.

[106]  Stephan Oepen,et al.  Maximum Entropy Models for Realization Ranking , 2005 .

[107]  Robert L. Mercer,et al.  But Dictionaries Are Data Too , 1993, HLT.

[108]  Vaughan R. Pratt,et al.  A Linguistics Oriented Programming Language , 1973, IJCAI.

[109]  Reid G. Simmons,et al.  GRACE: An Autonomous Robot for the AAAI Robot Challenge , 2003, AI Mag..

[110]  David J. Weir,et al.  Characterizing mildly context-sensitive grammar formalisms , 1988 .

[111]  John M. Zelle,et al.  Using inductive logic programming to automate the construction of natural language parsers , 1996 .

[112]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[113]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[114]  Dan Flickinger,et al.  An Open Source Grammar Development Environment and Broad-coverage English Grammar Using HPSG , 2000, LREC.

[115]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[116]  John Carroll,et al.  An Efficient Chart Generator for (Semi-)Lexicalist Grammars , 2001 .

[117]  Kevin Knight,et al.  Syntactic Re-Alignment Models for Machine Translation , 2007, EMNLP.

[118]  VICTOR H. YNGVE Random generation of English sentences , 1961, EARLYMT.

[119]  Srinivas Bangalore,et al.  Automated extraction of Tree-Adjoining Grammars from treebanks , 2006, Nat. Lang. Eng..

[120]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[121]  Gregory Kuhlmann and Peter Stone and Raymond J. Mooney and Shavlik Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer , 2004, AAAI 2004.

[122]  Dekai Wu,et al.  Machine Translation with a Stochastic Grammatical Channel , 1998, COLING-ACL.

[123]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[124]  Ernst Buchberger,et al.  Relating Syntax and Semantics: The Syntactico-Semantic Lexicon of the System VIE-LANG , 1983, EACL.

[125]  Kim K. Baldridge,et al.  Adapting Chart Realization to CCG , 2003, ENLG@EACL.

[126]  Roger Levy,et al.  Deep Dependencies from Context-Free Statistical Parsers: Correcting the Surface Dependency Approximation , 2004, ACL.

[127]  Daniel Gildea,et al.  An Algorithm for Word-Level Alignment of Parallel Dependency Trees1 , 2003 .

[128]  Christoph Tillmann,et al.  A Projection Extension Algorithm for Statistical Machine Translation , 2003, EMNLP.

[129]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[130]  Hermann Ney,et al.  A Comparison of Alignment Models for Statistical Machine Translation , 2000, COLING.

[131]  Richard M. Schwartz,et al.  Hidden Understanding Models of Natural Language , 1994, ACL.

[132]  Alex Acero,et al.  Combination of CFG and n-gram modeling in semantic grammar learning , 2003, INTERSPEECH.

[133]  Jun'ichi Tsujii,et al.  Corpus-Oriented Grammar Development for Acquiring a Head-Driven Phrase Structure Grammar from the Penn Treebank , 2004, IJCNLP.

[134]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[135]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[136]  Stuart M. Shieber,et al.  Synchronous Tree-Adjoining Grammars , 1990, COLING.

[137]  Dick Crouch,et al.  Packed Rewriting for Mapping Semantics to KR , 2005 .

[138]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[139]  Rohit J. Kate,et al.  Using String-Kernels for Learning Semantic Parsers , 2006, ACL.

[140]  Raymond J. Mooney,et al.  Learning Synchronous Grammars for Semantic Parsing with Lambda Calculus , 2007, ACL.

[141]  Michael Gamon,et al.  Linguistically Informed Statistical Models of Constituent Structure for Ordering in Sentence Realization , 2004, COLING.

[142]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[143]  Mark Steedman,et al.  Acquiring Compact Lexicalized Grammars from a Cleaner Treebank , 2002, LREC.

[144]  M ShieberStuart,et al.  The problem of logical-form equivalence , 1993 .

[145]  Anja Belz,et al.  Comparing Automatic and Human Evaluation of NLG Systems , 2006, EACL.

[146]  Anne Abeillé,et al.  A Lexicalized Tree Adjoining Grammar for English , 1990 .

[147]  Michael Gamon,et al.  An Overview of Amalgam: A Machine-learned Generation Module , 2002, INLG.

[148]  Kevin Knight,et al.  An Overview of Probabilistic Tree Transducers for Natural Language Processing , 2005, CICLing.

[149]  Martin Kay,et al.  Chart Generation , 1996, ACL.

[150]  Mark Johnson,et al.  Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques , 2002, ACL.

[151]  Hwee Tou Ng,et al.  Named Entity Recognition with a Maximum Entropy Approach , 2003, CoNLL.

[152]  Francis Jeffry Pelletier,et al.  Representation and Inference for Natural Language: A First Course in Computational Semantics , 2005, Computational Linguistics.

[153]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[154]  Raymond J. Mooney,et al.  Automatic Construction of Semantic Lexicons for Learning Natural Language Interfaces , 1999, AAAI/IAAI.

[155]  Katrin Erk,et al.  SemEval-2007 Task 19: Frame Semantic Structure Extraction , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[156]  László Dezsö,et al.  Universal Grammar , 1981, Certainty in Action.

[157]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[158]  Tsujii Jun'ichi,et al.  Maximum entropy estimation for feature forests , 2002 .

[159]  Andreas Stolcke,et al.  An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities , 1994, CL.

[160]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[161]  Raymond J. Mooney,et al.  Learning for Semantic Parsing with Statistical Machine Translation , 2006, NAACL.

[162]  Jun'ichi Tsujii,et al.  Probabilistic Models for Disambiguation of an HPSG-Based Chart Generator , 2005, IWPT.

[163]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.