论文信息 - Spoken language generation

Spoken language generation

There are long traditions of research in both natural language generation and speech synthesis (Carbonell, 1970; Simmons & Slocum, 1975; Sproat & Olive, 1995; Young & Fallside, 1979). Research in natural language generation has focused on the output of paragraph length texts, given as input either a meaning representation or tabular data resulting from a database query (Cahill et al., 2001; Hitzeman, Black, Taylor, Mellish, & Oberlander, 1998; Kittredge, Polgu ere, & Goldberg, 1986, 1991; McDonald, 1983; Meteer, 1991; Scott & Sieckenius de Souza, 1990), or on the production of instructions or explanations in tutorial written dialogue given as input a plan-based representation (Moore & Paris, 1993). Research in speech synthesis has focused on producing highquality output given a (possibly marked up) textual string input (Beutnagel, Conkie, Schroeter, Stylianou, & Syrdal, 1999; Black & Lenzo, 2000; Sproat & Olive, 1995). However recently, as many applications have emerged that require spoken language output, such as spoken dialogue systems, briefing systems, speech-to-speech translation, automated sports commentators, and directions systems, there has been an increase in research that relates these two strands of work. This research is motivated by several goals. First, there is the potential for improving the quality of synthesis by using the generator to provide information about the purpose, meaning, and linguistic structure of the utterance to the synthesis process. A second goal is to use natural language generation to make it possible to customize systems that generate spoken language to individual or sets of users or new domains very quickly. There are a number of open research challenges. These include the generation of utterances in interactive dialogue that are sensitive to listeners’ working memory constraints, the generation of speech acts whose purpose is other than to describe or inform, determining the appropriate prosody for spoken output, incorporating corpusbased or statistical knowledge into the generation and synthesis processes, generation of utterances in real time in dynamic environments, providing a deeper level of integration between generation and synthesis, and developing methods for evaluating the efficacy of different generation techniques. Computer Speech and Language (2002) 16, 273–281 doi:10.1016/S0885-2308(02)00029-3 Available online at http://www.idealibrary.com on

Marilyn A. Walker | Owen Rambow | Owen Rambow | M. Walker

[1] Mari Ostendorf,et al. Joint prosody prediction and unit selection for concatenative speech synthesis , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2] Owen Rambow,et al. On the need for domain communication knowledge , 1991 .

[3] Owen Rambow,et al. Applied Text Generation , 1992, ANLP.

[4] Johanna D. Moore,et al. Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information , 1993, CL.

[5] Chris Brew,et al. Stochastic text generation , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[6] Stephen Young. Probabilistic methods in spoken–dialogue systems , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[7] Kees van Deemter,et al. Context modeling and the generation of spoken discourse , 1997, Speech Commun..

[8] Robert F. Simmons,et al. Generating English discourse from semantic networks , 1972, CACM.

[9] Julia Hirschberg,et al. Exploring features from natural language generation for prosody modeling , 2002, Comput. Speech Lang..

[10] Richard Shillcock,et al. Proceedings of EUROSPEECH-1991. , 1991 .

[11] Alain Polguère,et al. Synthesizing Weather Forecasts from Formatted Data , 1986, COLING.

[12] Joseph Polifroni,et al. Formal and natural language generation in the Mercury conversational system , 2000, INTERSPEECH.

[13] Alan W. Black,et al. Limited domain synthesis , 2000, INTERSPEECH.

[14] Kuldip K. Paliwal,et al. Speech Coding and Synthesis , 1995 .

[15] Alexander I. Rudnicky,et al. Stochastic natural language generation for spoken dialog systems , 2002, Comput. Speech Lang..

[16] Amanda J. Stent,et al. Dialogue Systems as Conversational Partners: Applying Conversation Acts Theory to Natural Language G , 2001 .

[17] M. Meteer. Bridging the generation gap between text planning and linguistic realization , 1991 .

[18] Benoit Lavoie,et al. A Fast and Portable Realizer for Text Generation Systems , 1997, ANLP.