Hybrid Natural Language Generation from Lexical Conceptual Structures

This paper describes Lexogen, a system for generating natural-languagesentences from Lexical Conceptual Structure, an interlingualrepresentation. The system has been developed as part of aChinese–English Machine Translation (MT) system; however, it isdesigned to be used for many other MT language pairs and naturallanguage applications. The contributions of this work include: (1)development of a large-scale Hybrid Natural Language Generation system withlanguage-independent components; (2) enhancements to an interlingualrepresentation and associated algorithm forgeneration from ambiguous input; (3) development of an efficientreusable language-independent linearization module with a grammardescription language that can be used with other systems; (4)improvements to an earlier algorithm forhierarchically mapping thematic roles to surface positions; and (5)development of a diagnostic tool for lexicon coverage and correctnessand use of the tool for verification of English, Spanish, and Chineselexicons. An evaluation of Chinese–English translation quality showscomparable performance with a commercial translation system. Thegeneration system can also be extended to other languages and this isdemonstrated and evaluated for Spanish.

[1]  James Shaw,et al.  Ordering Among Premodifiers , 1999, ACL.

[2]  John A. Bateman,et al.  Multilingual Natural Language Generation for Multilingual Software: A Functional Linguistic Approach , 1999, Appl. Artif. Intell..

[3]  Vasileios Hatzivassiloglou,et al.  Two-Level, Many-Paths Generation , 1995, ACL.

[4]  Srinivas Bangalore,et al.  Evaluation Metrics for Generation , 2000, INLG.

[5]  Michael Elhadad,et al.  Controlling Content Realization with Functional Unification Grammars , 1992, NLG.

[6]  Eduard Hovy,et al.  Generating Natural Language Under Pragmatic Constraints , 1988 .

[7]  Irene Langkilde-Geary,et al.  An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator , 2002, INLG.

[8]  Nizar Habash A Reference Manual to the Linearization Engine oxyGen version 1.6 , 2001 .

[9]  Bonnie J. Dorr,et al.  Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation , 1998, Machine Translation.

[10]  Bonnie J. Dorr,et al.  Machine Translation: A View from the Lexicon , 1994, CL.

[11]  Dietmar F. Rösner Automatische Generierung von mehrsprachigen Instruktionstexten aus einer Wissensbasis , 1994 .

[12]  XTAG Research Group,et al.  A Lexicalized Tree Adjoining Grammar for English , 1998, ArXiv.

[13]  Bonnie J. Dorr,et al.  Interlingual Machine Translation: A Parameterized Approach , 1993, Artif. Intell..

[14]  Srinivas Bangalore,et al.  Learning Dependency Translation Models as Collections of Finite-State Head Transducers , 2000, Computational Linguistics.

[15]  Kenneth Ward Church,et al.  Good applications for crummy machine translation , 1993, Machine Translation.

[16]  J. Grimshaw,et al.  Light verbs and 'th'-marking , 1988 .

[17]  Michael Halliday,et al.  An Introduction to Functional Grammar , 1985 .

[18]  Ray Jackendoff,et al.  Semantic Interpretation in Generative Grammar , 1972 .

[19]  John A. Bateman,et al.  Enabling technology for multilingual natural language generation: the KPML development environment , 1997, Natural Language Engineering.

[20]  Kevin Knight,et al.  Building a Large-Scale Knowledge Base for Machine Translation , 1994, AAAI.

[21]  Alain Polguère,et al.  Multi-Lingual Text Generation and the Meaning-Text Theory , 2005 .

[22]  John S. White Envisioning Machine Translation in the Information Future: 4th Conference of the Association for Machine Translation in the Americas, AMTA 2000, Cuernavaca, Mexico, October 10-14, 2000 Proceedings , 2000 .

[23]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[24]  B. Levin,et al.  From Lexical Semantics to Argument Realization , 1996 .

[25]  C. Bishop The MIT Encyclopedia of the Cognitive Sciences , 1999 .

[26]  Cécile Paris,et al.  A Support Tool for Writing Multilingual Instructions , 1995, IJCAI.

[27]  Srinivas Bangalore,et al.  Corpus-Based Lexical Choice in Natural Language Generation , 2000, ACL.

[28]  G. Miller,et al.  Semantic networks of english , 1991, Cognition.

[29]  Nizar Habash,et al.  A thematic hierarchy for efficient generation from lexical-conceptual structure , 1998, AMTA.

[30]  Michael Elhadad,et al.  Floating Constraints in Lexical Choice , 1997, Comput. Linguistics.

[31]  Wendy K. Wilkins,et al.  Thematic Structure and Reflexivization , 1988 .

[32]  Adwait Ratnaparkhi,et al.  Trainable Methods for Surface Natural Language Generation , 2000, ANLP.

[33]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[34]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[35]  Mari Jean Broman Olsen,et al.  A Semantic and Pragmatic Model of Lexical and Grammatical Aspect , 1997 .

[36]  Anne Abeillé,et al.  A Lexicalized Tree Adjoining Grammar for English , 1990 .

[37]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[38]  Rob Malouf,et al.  The Order of Prenominal Adjectives in Natural Language Generation , 2000, ACL.

[39]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[40]  Donia Scott,et al.  Raising the Interlingual Ceiling with Multilingual Text Generation , 1995 .

[41]  Joan Bresnan,et al.  Locative inversion in Chichewa: a case study of factorization in grammar , 1989 .

[42]  Nizar Habash Oxygen: A Language Independent Linearization Engine , 2000, AMTA.

[43]  inGv Z3avnLdaUn,et al.  Constraints on the Generation of Tense, Aspect, and Connecting Words from Temporal Expressions , 2002 .

[44]  Bonnie J. Dorr,et al.  Large-Scale Acquisition of LCS-Based Lexicons for Foreign Language Tutoring , 1997, ANLP.

[45]  Bonnie J. Dorr,et al.  Multilingual generation: The role of telicity in lexical choice and syntactic realization , 2004, Machine Translation.

[46]  R. Jackendoff The proper treatment of measuring out, telicity, and perhaps even quantification in english , 1996 .

[47]  Nizar Habash,et al.  Large scale language independent generation using thematic hierarchies , 2001, MTSUMMIT.

[48]  Mari J. B. Olsen,et al.  Implicit Cues for Explicit Generation: Using Telicity as a Cue for Tense Structure in a Chinese to English MT System , 2001 .

[49]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[50]  Jill Carrier-Duncan Linking of thematic roles in derivational word formation , 1985 .

[51]  Kevin Knight,et al.  The Practical Value of N-Grams Is in Generation , 1998, INLG.

[52]  Graeme Hirst,et al.  Semantic Interpretation and the Resolution of Ambiguity , 1987, Studies in natural language processing.

[53]  H. B. Allen,et al.  A Functional Grammar , 1946 .

[54]  Taisuke Nishigauchi,et al.  Control and the Thematic Domain , 1984 .

[55]  Daniel Marcu,et al.  An empirical study of multilingual natural language generation: What Should a Text Planner Do? , 2000, INLG.

[56]  Gina-Anne Levow,et al.  Building a Chinese-English mapping between verb concepts for multilingual applications , 2000, AMTA.

[57]  Bonnie J. Dorr,et al.  Machine Translation Divergences: A Formal Description and Proposed Solution , 1994, CL.

[58]  Yukio Oba,et al.  ON THE DOUBLE OBJECT CONSTRUCTION , 1993 .

[59]  Ehud Reiter,et al.  NLG vs. Templates , 1995, ArXiv.

[60]  Matthew Haines,et al.  Filling Knowledge Gaps in a Broad-Coverage Machine Translation System , 1995, IJCAI.

[61]  A. Giorgi TOWARD A THEORY OF LONG DISTANCE ANAPHORS: a GB approach , 1984 .

[62]  Bonnie J. Dorr,et al.  From syntactic encodings to thematic roles: Building lexical entries for interlingual MT , 2004, Machine Translation.

[63]  Gina-Anne Levow,et al.  Construction of a Chinese-English Verb Lexicon for Embedded Machine Translation in Cross-Language Information Retrieval , 2002 .

[64]  John S. White,et al.  Envisioning Machine Translation in the Information Future , 2002, Lecture Notes in Computer Science.

[65]  Susumu Akamine,et al.  Multi-lingual Sentence Generation from the PIVOT Interlingua , 1991 .

[66]  Matthew Haines,et al.  Integrating Knowledge Bases and Statistics in MT , 1994, AMTA.

[67]  Aravind K. Joshi,et al.  An Introduction to Tree Adjoining Grammar , 1987 .

[68]  Manfred Stede,et al.  TECHDOC: Multilingual generation of online and offline instructional text , 1994, ANLP.