Statistical Generation: Three Methods Compared and Evaluated

Statistical NL G has largely meant n-gram modelling which has the considerable advantages of lending robustness to NL G systems, and of making automatic adaptation to new domains from raw corpora possible. On the downside, n-gram models are expensive to use as selection mechanisms and have a built-in bias towards shorter realisations. This paper looks at treebank-training of generators, an alternative method for building statistical models for NL G from raw corpora, and two different ways of using treebank-trained models during generation. Results show that the treebank-trained generators achieve improvements similar to a 2-gram generator over a baseline of random selection. However, the treebank-trained generators achieve this at a much lower cost than the 2-gram generator, and without its strong preference for shorter reasations.

[1]  Ted Briscoe,et al.  Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars , 1993, CL.

[2]  Nizar Habash The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation , 2004, INLG.

[3]  Vibhu O. Mittal,et al.  Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries (poster abstract). , 1998, SIGIR 1999.

[4]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[5]  S. Oepen,et al.  Paraphrasing Treebanks for Stochastic Realization Ranking , 2004 .

[6]  Massimo Poesio,et al.  Statistical NP Generation: A First Report , 2007 .

[7]  Matthew Haines,et al.  Integrating Knowledge Bases and Statistics in MT , 1994, AMTA.

[8]  Srinivas Bangalore,et al.  Evaluation Metrics for Generation , 2000, INLG.

[9]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[10]  Mitchell P. Marcus,et al.  Pearl: A Probabilistic Chart Parser , 1991, EACL.

[11]  Jim Hunter,et al.  Exploiting a parallel TEXT - DATA corpus , 2003 .

[12]  Kevin Humphreys,et al.  Reusing a Statistical Language Model for Generation , 2001, EWNLG@ACL.

[13]  James Shaw,et al.  Ordering Among Premodifiers , 1999, ACL.

[14]  Srinivas Bangalore,et al.  Corpus-Based Lexical Choice in Natural Language Generation , 2000, ACL.

[15]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[16]  Srinivas Bangalore,et al.  Exploiting a Probabilistic Hierarchical Model for Generation , 2000, COLING.

[17]  Michael Strube,et al.  Classification-Based Generation Using TAG , 2004, INLG.

[18]  Ted Briscoe,et al.  Probabilistic Normalisation and Unpacking of Packed Parse Forests for Unification-based Grammars , 1992 .

[19]  Rob Malouf,et al.  The Order of Prenominal Adjectives in Natural Language Generation , 2000, ACL.

[20]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[21]  Alexander I. Rudnicky,et al.  Stochastic Language Generation for Spoken Dialogue Systems , 2000 .

[22]  Michael Strube,et al.  A Probabilistic Genre-Independent Model of Pronominalization , 2000, ANLP.

[23]  Adwait Ratnaparkhi,et al.  Trainable Methods for Surface Natural Language Generation , 2000, ANLP.

[24]  Marilyn A. Walker,et al.  An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email , 2000, J. Artif. Intell. Res..

[25]  Irene Langkilde-Geary,et al.  Forest-Based Statistical Sentence Generation , 2000, ANLP.

[26]  Michael White,et al.  Reining in CCG Chart Realization , 2004, INLG.