论文信息 - Statistical Generation: Three Methods Compared and Evaluated

Statistical Generation: Three Methods Compared and Evaluated

Statistical NL G has largely meant n-gram modelling which has the considerable advantages of lending robustness to NL G systems, and of making automatic adaptation to new domains from raw corpora possible. On the downside, n-gram models are expensive to use as selection mechanisms and have a built-in bias towards shorter realisations. This paper looks at treebank-training of generators, an alternative method for building statistical models for NL G from raw corpora, and two different ways of using treebank-trained models during generation. Results show that the treebank-trained generators achieve improvements similar to a 2-gram generator over a baseline of random selection. However, the treebank-trained generators achieve this at a much lower cost than the 2-gram generator, and without its strong preference for shorter reasations.

Anja Belz | A. Belz

[1] Ted Briscoe,et al. Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars , 1993, CL.

[2] Nizar Habash. The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation , 2004, INLG.

[3] Vibhu O. Mittal,et al. Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries (poster abstract). , 1998, SIGIR 1999.

[4] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[5] S. Oepen,et al. Paraphrasing Treebanks for Stochastic Realization Ranking , 2004 .

[6] Massimo Poesio,et al. Statistical NP Generation: A First Report , 2007 .

[7] Matthew Haines,et al. Integrating Knowledge Bases and Statistics in MT , 1994, AMTA.

[8] Srinivas Bangalore,et al. Evaluation Metrics for Generation , 2000, INLG.

[9] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[10] Mitchell P. Marcus,et al. Pearl: A Probabilistic Chart Parser , 1991, EACL.

[11] Jim Hunter,et al. Exploiting a parallel TEXT - DATA corpus , 2003 .