Spoken language generation

There are long traditions of research in both natural language generation and speech synthesis (Carbonell, 1970; Simmons & Slocum, 1975; Sproat & Olive, 1995; Young & Fallside, 1979). Research in natural language generation has focused on the output of paragraph length texts, given as input either a meaning representation or tabular data resulting from a database query (Cahill et al., 2001; Hitzeman, Black, Taylor, Mellish, & Oberlander, 1998; Kittredge, Polgu ere, & Goldberg, 1986, 1991; McDonald, 1983; Meteer, 1991; Scott & Sieckenius de Souza, 1990), or on the production of instructions or explanations in tutorial written dialogue given as input a plan-based representation (Moore & Paris, 1993). Research in speech synthesis has focused on producing highquality output given a (possibly marked up) textual string input (Beutnagel, Conkie, Schroeter, Stylianou, & Syrdal, 1999; Black & Lenzo, 2000; Sproat & Olive, 1995). However recently, as many applications have emerged that require spoken language output, such as spoken dialogue systems, briefing systems, speech-to-speech translation, automated sports commentators, and directions systems, there has been an increase in research that relates these two strands of work. This research is motivated by several goals. First, there is the potential for improving the quality of synthesis by using the generator to provide information about the purpose, meaning, and linguistic structure of the utterance to the synthesis process. A second goal is to use natural language generation to make it possible to customize systems that generate spoken language to individual or sets of users or new domains very quickly. There are a number of open research challenges. These include the generation of utterances in interactive dialogue that are sensitive to listeners’ working memory constraints, the generation of speech acts whose purpose is other than to describe or inform, determining the appropriate prosody for spoken output, incorporating corpusbased or statistical knowledge into the generation and synthesis processes, generation of utterances in real time in dynamic environments, providing a deeper level of integration between generation and synthesis, and developing methods for evaluating the efficacy of different generation techniques. Computer Speech and Language (2002) 16, 273–281 doi:10.1016/S0885-2308(02)00029-3 Available online at http://www.idealibrary.com on

[1]  Mari Ostendorf,et al.  Joint prosody prediction and unit selection for concatenative speech synthesis , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  Owen Rambow,et al.  On the need for domain communication knowledge , 1991 .

[3]  Owen Rambow,et al.  Applied Text Generation , 1992, ANLP.

[4]  Johanna D. Moore,et al.  Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information , 1993, CL.

[5]  Chris Brew,et al.  Stochastic text generation , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[6]  Stephen Young Probabilistic methods in spoken–dialogue systems , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[7]  Kees van Deemter,et al.  Context modeling and the generation of spoken discourse , 1997, Speech Commun..

[8]  Robert F. Simmons,et al.  Generating English discourse from semantic networks , 1972, CACM.

[9]  Julia Hirschberg,et al.  Exploring features from natural language generation for prosody modeling , 2002, Comput. Speech Lang..

[10]  Richard Shillcock,et al.  Proceedings of EUROSPEECH-1991. , 1991 .

[11]  Alain Polguère,et al.  Synthesizing Weather Forecasts from Formatted Data , 1986, COLING.

[12]  Joseph Polifroni,et al.  Formal and natural language generation in the Mercury conversational system , 2000, INTERSPEECH.

[13]  Alan W. Black,et al.  Limited domain synthesis , 2000, INTERSPEECH.

[14]  Kuldip K. Paliwal,et al.  Speech Coding and Synthesis , 1995 .

[15]  Alexander I. Rudnicky,et al.  Stochastic natural language generation for spoken dialog systems , 2002, Comput. Speech Lang..

[16]  Amanda J. Stent,et al.  Dialogue Systems as Conversational Partners: Applying Conversation Acts Theory to Natural Language G , 2001 .

[17]  M. Meteer Bridging the generation gap between text planning and linguistic realization , 1991 .

[18]  Benoit Lavoie,et al.  A Fast and Portable Realizer for Text Generation Systems , 1997, ANLP.

[19]  Chris Mellish,et al.  On the use of automatically generated discourse-level information in a concept-to-speech synthesis system , 1998, ICSLP.

[20]  Jaime R. Carbonell,et al.  AI in CAI : An artificial intelligence approach to computer-assisted instruction , 1970 .

[21]  David R. Traum,et al.  CONVERSATION ACTS IN TASK‐ORIENTED SPOKEN DIALOGUE , 1992, Comput. Intell..

[22]  Marilyn A. Walker,et al.  User-tailored generation for spoken dialogue: an experiment , 2002, INTERSPEECH.

[23]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[24]  Herbert H. Clark,et al.  Grounding in communication , 1991, Perspectives on socially shared cognition.

[25]  Eam Esther Klabbers,et al.  Segmental and prosodic improvements to speech generation , 2000 .

[26]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[27]  Jlfnm Fpoli,et al.  Training a Sentence Planner for Spoken Dialogue Using Boosting , 2002 .

[28]  Emiel Krahmer,et al.  From data to speech: a general approach , 2001, Natural Language Engineering.

[29]  Daniel Marcu,et al.  From Local to Global Coherence: A Bottom-Up Approach to Text Planning , 1997, AAAI/IAAI.

[30]  David D. McDonald Description directed control: its implications for natural language generation , 1986 .

[31]  Stephanie D. Teasley,et al.  Perspectives on socially shared cognition , 1991 .

[32]  Chris Mellish,et al.  Evaluation in the context of natural language generation , 1998, Comput. Speech Lang..

[33]  Chris Mellish,et al.  Current research in natural language generation , 1990 .

[34]  K. McKeown,et al.  Discourse Strategies for Generating Natural-Language Text , 1985, Artif. Intell..

[35]  Marilyn A. Walker,et al.  MATCH: An Architecture for Multimodal Dialogue Systems , 2002, ACL.

[36]  Kees van Deemter,et al.  From RAGS to RICHES: Exploiting the Potential of a Flexible Generation Architecture , 2001, ACL.

[37]  Chris Mellish,et al.  Experiments Using Stochastic Search for Text Planning , 1998, INLG.

[38]  Joseph Polifroni,et al.  Organization, communication, and control in the GALAXY-II conversational system , 1999, EUROSPEECH.

[39]  Mark Steedman,et al.  Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.

[40]  Irene Langkilde Forest-Based Statistical Sentence Generation , 2000, ANLP.

[41]  Mark-Jan Nederhof,et al.  Robust grammatical analysis for spoken dialogue systems , 1999, Natural Language Engineering.

[42]  W. Levelt,et al.  Speaking: From Intention to Articulation , 1990 .

[43]  Johanna D. Moore,et al.  Towards a Principled Representation of Discourse Plans , 1994, Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society.

[44]  Mari Ostendorf,et al.  Efficient integrated response generation from multiple targets using weighted finite state transducers , 2002, Comput. Speech Lang..

[45]  Alexander I. Rudnicky,et al.  Task and domain specific modelling in the Carnegie Mellon communicator system , 2000, INTERSPEECH.

[46]  Gregory A. Sanders,et al.  DARPA communicator: cross-system results for the 2001 evaluation , 2002, INTERSPEECH.

[47]  Clarisse Sieckenius de Souza,et al.  Getting the message across in RST-based text generation , 1990 .

[48]  F. Fallside,et al.  Speech synthesis from concept: A method for speech output from information systems , 1979 .

[49]  Julia Hirschberg,et al.  Learning prosodic features using a tree representation , 2001, INTERSPEECH.

[50]  Srinivas Bangalore,et al.  Exploiting a Probabilistic Hierarchical Model for Generation , 2000, COLING.

[51]  HighWire Press Philosophical Transactions of the Royal Society of London , 1781, The London Medical Journal.

[52]  Ehud Reiter,et al.  Has a Consensus NL Generation Architecture Appeared, and is it Psycholinguistically Plausible? , 1994, INLG.

[53]  Johanna D. Moore,et al.  A strategy for generating evaluative arguments , 2000, INLG.

[54]  Scott Axelrod Natural Language Generation in the IBM Flight Information System , 2000 .