Broad Coverage Multilingual Deep Sentence Generation with a Stochastic Multi-Level Realizer

Most of the known stochastic sentence generators use syntactically annotated corpora, performing the projection to the surface in one stage. However, in full-fledged text generation, sentence realization usually starts from semantic (predicate-argument) structures. To be able to deal with semantic structures, stochastic generators require semantically annotated, or, even better, multilevel annotated corpora. Only then can they deal with such crucial generation issues as sentence planning, linearization and morphologization. Multilevel annotated corpora are increasingly available for multiple languages. We take advantage of them and propose a multilingual deep stochastic sentence realizer that mirrors the state-of-the-art research in semantic parsing. The realizer uses an SVM learning algorithm. For each pair of adjacent levels of annotation, a separate decoder is defined. So far, we evaluated the realizer for Chinese, English, German, and Spanish.

[1]  Michael Strube,et al.  Tree Linearization in English: Improving Language Model Based Approaches , 2009, NAACL.

[2]  Vasileios Hatzivassiloglou,et al.  Two-Level, Many-Paths Generation , 1995, ACL.

[3]  Bernd Bohnet A Graph Grammar Approach to Map Between Dependency Trees and Topological Models , 2004, IJCNLP.

[4]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[5]  Norbert Bröker,et al.  Separating Surface Order and Syntactic Relations in a Dependency Grammar , 1998, COLING-ACL.

[6]  Chris Mellish,et al.  A Reference Architecture for Natural Language Generation Systems , 2006, Natural Language Engineering.

[7]  Stephen Wan,et al.  Improving Grammaticality in Statistical Sentence Generation: Introducing a Dependency Spanning Tree Algorithm with an Argument Satisfaction Model , 2009, EACL.

[8]  Wei He,et al.  Dependency Based Chinese Sentence Realization , 2009, ACL/IJCNLP.

[9]  Michael Strube,et al.  Sentence Fusion via Dependency Graph Compression , 2008, EMNLP.

[10]  Ralph Debusmann,et al.  Topological Dependency Trees: A Constraint-Based Account of Linear Precedence , 2001, ACL.

[11]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[12]  Benoit Lavoie,et al.  A Fast and Portable Realizer for Text Generation Systems , 1997, ANLP.

[13]  Michael Gamon,et al.  Linguistically Informed Statistical Models of Constituent Structure for Ordering in Sentence Realization , 2004, COLING.

[14]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[15]  Alexander I. Rudnicky,et al.  Stochastic Language Generation for Spoken Dialogue Systems , 2000 .

[16]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[17]  Srinivas Bangalore,et al.  Impact of Quality and Quantity of Corpora on Stochastic Generation , 2001, EMNLP.

[18]  Anja Belz,et al.  Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models , 2008, Natural Language Engineering.

[19]  Ivana Kruijff-Korbayová,et al.  Multilingual Resource Sharing Across Both Related and Unrelated Languages: An Implemented, Open-Source Framework for Practical Natural Language Generation , 2005 .

[20]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[21]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[22]  Irene Langkilde-Geary,et al.  An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator , 2002, INLG.

[23]  Sylvain Kahane,et al.  Word Order in German: A Formal Dependency Grammar Using a Topological Hierarchy , 2001, ACL.

[24]  Michael Elhadad,et al.  An Overview of SURGE: a Reusable Comprehensive Syntactic Realization Component , 1996, INLG.

[25]  Srinivas Bangalore,et al.  Exploiting a Probabilistic Hierarchical Model for Generation , 2000, COLING.