Data-driven sentence generation with non-isomorphic trees

Abstract structures from which the generation naturally starts often do not contain any func- tional nodes, while surface-syntactic struc- tures or a chain of tokens in a linearized tree contain all of them. Therefore, data-driven linguistic generation needs to be able to cope with the projection between non-isomorphic structures that differ in their topology and number of nodes. So far, such a projection has been a challenge in data-driven genera- tion and was largely avoided. We present a fully stochastic generator that is able to cope with projection between non-isomorphic structures. The generator, which starts from PropBank-like structures, consists of a cas- cade of SVM-classifier based submodules that map in a series of transitions the input struc- tures onto sentences. The generator has been evaluated for English on the Penn-Treebank and for Spanish on the multi-layered Ancora- UPF corpus.

[1]  Raymond J. Mooney,et al.  Generation by Inverting a Semantic Parser that Uses Statistical Machine Translation , 2007, NAACL.

[2]  Milica Gasic,et al.  Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning , 2010, ACL.

[3]  Leo Wanner,et al.  AnCora-UPF: A Multi-Level Annotation of Spanish , 2013, DepLing.

[4]  Gabriela Ferraro,et al.  How Does the Granularity of an Annotation Scheme Influence Dependency Parsing Performance? , 2012, COLING.

[5]  Claire Gardent,et al.  LOR-KBGEN, A Hybrid Approach To Generating from the KBGen Knowledge-Base , 2013, ENLG.

[6]  Jörg Tiedemann,et al.  BLEU Is Not the Colour : How Optimising BLEU Reduces Translation Quality , 2014 .

[7]  Srinivas Bangalore,et al.  Exploiting a Probabilistic Hierarchical Model for Generation , 2000, COLING.

[8]  Amanda J. Stent Building Surface Realizers Automatically from Corpora ∗ Huayan Zhong and , 2005 .

[9]  Leo Wanner,et al.  Deep-Syntactic Parsing , 2014, COLING.

[10]  Stephen Wan,et al.  Improving Grammaticality in Statistical Sentence Generation: Introducing a Dependency Spanning Tree Algorithm with an Argument Satisfaction Model , 2009, EACL.

[11]  Anja Belz,et al.  Statistical Generation: Three Methods Compared and Evaluated , 2005, ENLG.

[12]  Michael Strube,et al.  Generating Constituent Order in German Clauses , 2007, ACL.

[13]  Anja Belz,et al.  The First Surface Realisation Shared Task: Overview and Evaluation Results , 2011, ENLG.

[14]  Alexander I. Rudnicky,et al.  Stochastic natural language generation for spoken dialog systems , 2002, Comput. Speech Lang..

[15]  Josef van Genabith,et al.  Dependency-Based N-Gram Models for General Purpose Sentence Realisation , 2008, COLING.

[16]  Michael Strube,et al.  Sentence Fusion via Dependency Graph Compression , 2008, EMNLP.

[17]  Wei He,et al.  Dependency Based Chinese Sentence Realization , 2009, ACL/IJCNLP.

[18]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[19]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[20]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[21]  Michael Strube,et al.  Classification-Based Generation Using TAG , 2004, INLG.

[22]  Marilyn A. Walker,et al.  Trainable Sentence Planning for Complex Information Presentations in Spoken Dialog Systems , 2004, ACL.

[23]  Benoît Favre,et al.  StuMaBa : From Deep Representation to Surface , 2011, ENLG.

[24]  Jacob Andreas,et al.  Semantic Parsing as Machine Translation , 2013, ACL.

[25]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[26]  Jacob Andreas,et al.  Semantics-Based Machine Translation with Hyperedge Replacement Grammars , 2012, COLING.

[27]  Olga Babko-Malaya,et al.  PropBank Annotation Guidelines , 2010 .

[28]  Yue Zhang,et al.  Joint Morphological Generation and Syntactic Linearization , 2014, AAAI.

[29]  Michael White,et al.  The OSU System for Surface Realization at Generation Challenges 2011 , 2011, ENLG.

[30]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[31]  Michael Gamon,et al.  Linguistically Informed Statistical Models of Constituent Structure for Ordering in Sentence Realization , 2004, COLING.

[32]  Eric K. Ringger,et al.  Statistical Machine Translation Using Labeled Semantic Dependency Graphs , 2004 .

[33]  Irene Langkilde-Geary,et al.  An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator , 2002, INLG.

[34]  Leo Wanner,et al.  Classifiers for data-driven deep sentence generation , 2014, INLG.

[35]  Vasileios Hatzivassiloglou,et al.  Two-Level, Many-Paths Generation , 1995, ACL.

[36]  Josef van Genabith,et al.  DCU at Generation Challenges 2011 Surface Realisation Track , 2011, ENLG.

[37]  Leo Wanner,et al.  Broad Coverage Multilingual Deep Sentence Generation with a Stochastic Multi-Level Realizer , 2010, COLING.