论文信息 - Scaling a Natural Language Generation System

Scaling a Natural Language Generation System

A key goal in natural language generation (NLG) is to enable fast generation even with large vocabularies, grammars and worlds. In this work, we build upon a recently proposed NLG system, Sentence Tree Realization with UCT (STRUCT). We describe four enhancements to this system: (i) pruning the grammar based on the world and the communicative goal, (ii) intelligently caching and pruning the combinatorial space of semantic bindings, (iii) reusing the lookahead search tree at different search depths, and (iv) learning and using a search control heuristic. We evaluate the resulting system on three datasets of increasing size and complexity, the largest of which has a vocabulary of about 10K words, a grammar of about 32K lexicalized trees and a world with about 11K entities and 23K relations between them. Our results show that the system has a median generation time of 8.5s and finds the best sentence on average within 25s. These results are based on a sequential, interpreted implementation and are significantly better than the state of the art for planningbased NLG systems.

Jonathan Pfeil | Soumya Ray

[1] Matthew Stone,et al. Sentence generation as a planning problem , 2007, ACL.

[2] H. Jaap van den Herik,et al. Parallel Monte-Carlo Tree Search , 2008, Computers and Games.

[3] XTAG Research Group,et al. A Lexicalized Tree Adjoining Grammar for English , 1998, ArXiv.

[4] James H. Martin,et al. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[5] Robert Dale,et al. Building applied natural language generation systems , 1997, Natural Language Engineering.

[6] Irene Langkilde-Geary,et al. An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator , 2002, INLG.

[7] Anne Abeillé,et al. A Lexicalized Tree Adjoining Grammar for English , 1990 .

[8] Kim K. Baldridge,et al. Adapting Chart Realization to CCG , 2003, ENLG@EACL.

[9] Stuart M. Shieber,et al. A Uniform Architecture for Parsing and Generation , 1988, COLING.

[10] Ralph Grishman,et al. A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[11] Maria Fox,et al. PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , 2003, J. Artif. Intell. Res..