Submitted to Computer Speech and Language Stochastic Natural Language Generation for Spoken Dialog Systems

We describe a corpus-based approach to natural language generation (NLG). The approach has been implemented as a component of a spoken dialog system and a series of evaluations were carried out. Our system uses n-gram language models, which have been found useful in other language technology applications, in a generative mode. It is not yet clear whether the simple n-grams can adequately model human language generation in general, but we show that we can successfully apply this ubiquitous modeling technique to the task of natural language generation for spoken dialog systems. In this paper, we discuss applying corpus-based stochastic language generation at two levels: content selection and sentence planning/realization. At the content selection level, output utterances are modeled by bigrams, and the appropriate attributes are chosen using bigram statistics. In sentence planning and realization, corpus utterances are modeled by n-grams of varying length, and new utterances are generated stochastically. Through this work, we show that a simple statistical model alone can generate appropriate language for a spoken dialog system. The results describe a promising avenue for using a statistical approach in future NLG systems. A. H. Oh: Stochastic Natural Language Generation 3 Natural Language Understanding Natural Language Generation Surface Realization Semantic (Syntactic) Representation Semantic (Syntactic) Representation Surface Realization Figure 1: NLU and NLG

[1]  Raj Reddy,et al.  Steps Toward Graceful Interaction in Spoken and Written Man-Machine Communication , 1983, Int. J. Man Mach. Stud..

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Michael Elhadad,et al.  An Overview of SURGE: a Reusable Comprehensive Syntactic Realization Component , 1996, INLG.

[4]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[5]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[6]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[7]  Kevin Knight,et al.  The Practical Value of N-Grams Is in Generation , 1998, INLG.

[8]  Chris Mellish,et al.  Evaluation in the context of natural language generation , 1998, Comput. Speech Lang..

[9]  Helmut Horacek,et al.  A Flexible Shallow Approach to Text Generation , 1998, INLG.

[10]  Marilyn A. Walker,et al.  Evaluating spoken dialogue agents with PARADISE: Two case studies , 1998, Comput. Speech Lang..

[11]  Alexander I. Rudnicky,et al.  Creating natural dialogs in the carnegie mellon communicator system , 1999, EUROSPEECH.

[12]  Maxine Eskénazi,et al.  Data collection and processing in the carnegie mellon communicator , 1999, EUROSPEECH.

[13]  Alexander I. Rudnicky,et al.  Task-based dialog management using an agenda , 2000 .

[14]  Ehud Reiter,et al.  Knowledge Acquisition for Natural Language Generation , 2000, INLG.

[15]  Adwait Ratnaparkhi,et al.  Trainable Methods for Surface Natural Language Generation , 2000, ANLP.

[16]  Scott Axelrod Natural Language Generation in the IBM Flight Information System , 2000 .

[17]  Amanda Stent,et al.  Content planning and generation in continuous-speech spoken dialog systems∗ , 2000 .

[18]  Srinivas Bangalore,et al.  Evaluation Metrics for Generation , 2000, INLG.

[19]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[20]  Alexander I. Rudnicky,et al.  Stochastic Language Generation for Spoken Dialogue Systems , 2000 .

[21]  Owen Rambow,et al.  Evaluating a Trainable Sentence Planner for a Spoken Dialogue Travel System , 2001 .

[22]  Shimei Pan,et al.  Designing and Evaluating an Adaptive Spoken Dialogue System , 2002, User Modeling and User-Adapted Interaction.