Scaling a Natural Language Generation System

A key goal in natural language generation (NLG) is to enable fast generation even with large vocabularies, grammars and worlds. In this work, we build upon a recently proposed NLG system, Sentence Tree Realization with UCT (STRUCT). We describe four enhancements to this system: (i) pruning the grammar based on the world and the communicative goal, (ii) intelligently caching and pruning the combinatorial space of semantic bindings, (iii) reusing the lookahead search tree at different search depths, and (iv) learning and using a search control heuristic. We evaluate the resulting system on three datasets of increasing size and complexity, the largest of which has a vocabulary of about 10K words, a grammar of about 32K lexicalized trees and a world with about 11K entities and 23K relations between them. Our results show that the system has a median generation time of 8.5s and finds the best sentence on average within 25s. These results are based on a sequential, interpreted implementation and are significantly better than the state of the art for planningbased NLG systems.

[1]  Matthew Stone,et al.  Sentence generation as a planning problem , 2007, ACL.

[2]  H. Jaap van den Herik,et al.  Parallel Monte-Carlo Tree Search , 2008, Computers and Games.

[3]  XTAG Research Group,et al.  A Lexicalized Tree Adjoining Grammar for English , 1998, ArXiv.

[4]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[5]  Robert Dale,et al.  Building applied natural language generation systems , 1997, Natural Language Engineering.

[6]  Irene Langkilde-Geary,et al.  An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator , 2002, INLG.

[7]  Anne Abeillé,et al.  A Lexicalized Tree Adjoining Grammar for English , 1990 .

[8]  Kim K. Baldridge,et al.  Adapting Chart Realization to CCG , 2003, ENLG@EACL.

[9]  Stuart M. Shieber,et al.  A Uniform Architecture for Parsing and Generation , 1988, COLING.

[10]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[11]  Maria Fox,et al.  PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , 2003, J. Artif. Intell. Res..

[12]  Daniel Bauer,et al.  Sentence Generation as Planning with Probabilistic LTAG , 2010, TAG.

[13]  H. Jaap van den Herik,et al.  Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[14]  Martin Kay,et al.  Chart Generation , 1996, ACL.

[15]  Hwee Tou Ng,et al.  Natural Language Generation with Tree Conditional Random Fields , 2009, EMNLP.

[16]  Aravind K. Joshi,et al.  Tree-Adjoining Grammars , 1997, Handbook of Formal Languages.

[17]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[18]  David Vandyke,et al.  Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking , 2015, SIGDIAL Conference.

[19]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[20]  Anoop Sarkar Practical experiments in parsing using Tree Adjoining Grammars , 2000, TAG+.

[21]  Robert Dale,et al.  Viewing Referring Expression Generation as Search , 2005, IJCAI.

[22]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[23]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[24]  Robert Givan,et al.  Learning Control Knowledge for Forward Search Planning , 2008, J. Mach. Learn. Res..

[25]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.