From data to speech: a general approach

We present a data-to-speech system called D2S, which can be used for the creation of data-to-speech systems in different languages and domains. The most important characteristic of a data-to-speech system is that it combines language and speech generation: language generation is used to produce a natural language text expressing the system's input data, and speech generation is used to make this text audible. In D2S, this combination is exploited by using linguistic information available in the language generation module for the computation of prosody. This allows us to achieve a better prosodic output quality than can be achieved in a plain text-to-speech system. For language generation in D2S, the use of syntactically enriched templates is guided by knowledge of the discourse context, while for speech generation pre-recorded phrases are combined in a prosodically sophisticated manner. This combination of techniques makes it possible to create linguistically sound but efficient systems with a high quality language and speech output.

[1]  G Carenini,et al.  Generating patient-specific interactive natural language explanations. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[2]  Jan-Roelof de Pijper High-quality message-to-speech generation in a practical application , 1994, SSW.

[3]  Shimei Pan,et al.  Integrating Language Generation with Speech Synthesis in a Concept to Speech System , 1997 .

[4]  Esther Klabbers,et al.  Computing prosodic properties in a data-to-speech system , 1997 .

[5]  Aravind K. Joshi,et al.  An Introduction to Tree Adjoining Grammar , 1987 .

[6]  Mariët Theune,et al.  From data to speech : language generation in context , 2000 .

[7]  Marc Swerts,et al.  Isca Archive , 1999 .

[8]  Eam Esther Klabbers,et al.  Segmental and prosodic improvements to speech generation , 2000 .

[9]  Mariët Theune GoalGetter: predicting contrastive accent in data-to-speech generation , 1996 .

[10]  Mark-Jan Nederhof,et al.  Robust grammatical analysis for spoken dialogue systems , 1999, Natural Language Engineering.

[11]  Sieb G. Nooteboom,et al.  Opposite effects of accentuation and deaccentuation on verification latencies for given and new information , 1987 .

[12]  Emiel Krahmer,et al.  Plan-based vs. template-based NLG: a false opposition? , 1999 .

[13]  Hugo Quené,et al.  Prosodic analysis : the next generation , 1993 .

[14]  Kathleen McKeown,et al.  Generating Concise Natural Language Summaries , 1995, Inf. Process. Manag..

[15]  Raymond N. J. Veldhuis,et al.  On the reduction of concatenation artefacts in diphone synthesis , 1998, ICSLP.

[16]  Carlos Gussenhoven,et al.  Prosodic and intonational domains in speech synthesis , 1994, SSW.

[17]  Julia Hirschberg,et al.  Using discourse context to guide pitch accent decisions in synthetic speech , 1990, SSW.

[18]  Kees van Deemter What's New? A Semantic Perspective on Sentence Accent , 1994, J. Semant..

[19]  Toni C. M. Rietveld,et al.  Evaluation of speech synthesis systems for Dutch in tele-communication applications in GSM and PSTN networks , 1997, EUROSPEECH.

[20]  Thomas Rist,et al.  On the Simultaneous Interpretation of Real World Image Sequences and their Natural Language Description: The System Soccer , 1988, ECAI.

[21]  Daniël Nachtegaal An evaluation of GoalGetter's accentuation , 1997 .

[22]  Chris Mellish,et al.  Towards Evaluation in Natural Language Generation , 1998, LREC.

[23]  Ehud Reiter,et al.  NLG vs. Templates , 1995, ArXiv.

[24]  Kees van Deemter,et al.  Context modeling and the generation of spoken discourse , 1997, Speech Commun..

[25]  H. Grice Logic and conversation , 1975 .

[26]  W. Chafe Givenness, contrastiveness, definiteness, subjects, topics, and point of view , 1976 .

[27]  J. K. Bock,et al.  Intonational marking of given and new information: Some consequences for comprehension , 1983, Memory & cognition.

[28]  Helmut Horacek,et al.  A Flexible Shallow Approach to Text Generation , 1998, INLG.

[29]  Ehud Reiter,et al.  Has a Consensus NL Generation Architecture Appeared, and is it Psycholinguistically Plausible? , 1994, INLG.

[30]  CONCATENATIONE,et al.  HIGH-QUALITY SPEECH OUTPUT GENERATION THROUGH ADVANCEDPHRASE , 1997 .

[31]  Victor Zue,et al.  Conversational interfaces: advances and challenges , 1997, Proceedings of the IEEE.

[32]  Volker Steinbiss,et al.  The Philips automatic train timetable information system , 1995, Speech Commun..

[33]  Robert Dale,et al.  Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions , 1995, Cogn. Sci..

[34]  E.A.M. Klabbers,et al.  High-quality speech output generation through advanced phrase concatenation , 1997 .

[35]  Noam Chomsky,et al.  Lectures on Government and Binding , 1981 .

[36]  Leo Llm Vogten,et al.  A mixed-excitation vocoder based on exact analysis of harmonic components , 1997 .

[37]  Mariët Theune,et al.  Contrastive accent in a data-to-speech system , 1997, ACL.

[38]  Gillian R Brown,et al.  Prosodic Structure and the Given/New Distinction , 1983 .

[39]  J A Waterworth,et al.  Effect of intonation form and pause durations of automatic telephone number announcements on subjective preference and memory performance. , 1983, Applied ergonomics.

[40]  Mark Steedman Structure and Intonation in Spoken Language Undestanding , 1990, ACL.

[41]  Michael White,et al.  EXEMPLARS: A Practical, Extensible Framework For Dynamic Text Generation , 1998, INLG.

[42]  Vincent J. van Heuven,et al.  Analysis and synthesis of speech: strategic research towards high-quality text-to-speech generation , 1993 .

[43]  F. Fallside,et al.  Speech synthesis from concept: A method for speech output from information systems , 1979 .

[44]  Kôiti Hasida,et al.  Reactive Content Selection in the Generation of Real-time Soccer Commentary , 1998, COLING-ACL.

[45]  E. Krahmer,et al.  Efficient Generation of Descriptions in Context , 1999 .

[46]  James C. Lester,et al.  Developing and Empirically Evaluating Robust Explanation Generators: The KNIGHT Experiments , 1997, Comput. Linguistics.

[47]  Scott Prevost,et al.  A semantics of contrast and information structure for specifying intonation in spoken language generation , 1996 .

[48]  Robert Dale,et al.  Building applied natural language generation systems , 1997, Natural Language Engineering.

[49]  Emiel Krahmer,et al.  A guided tour through LGM: How to generate spoken route descriptions , 1998 .

[50]  José Coch Evaluating and comparing three text-production techniques , 1996, COLING.

[51]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[52]  John Hart,et al.  A Perceptual Study of Intonation , 1990 .

[53]  M. Halliday NOTES ON TRANSITIVITY AND THEME IN ENGLISH. PART 2 , 1967 .

[54]  Mariët Theune,et al.  Parallelism, coherence, and contrastive accent , 1999, EUROSPEECH.

[55]  Jejm Odijk Generation of coherent monologues , 1994 .

[56]  Ehud Reiter Shallow vs. Deep Techniques for Handling Linguistic Constraints and Optimisations , 1999 .

[57]  Agnieszka Mykowiecka,et al.  Natural-Language Generation - An Overview , 1991, Int. J. Man Mach. Stud..

[58]  Chris Mellish,et al.  Optimizing the Costs and Benefits of Natural Language Generation , 1993, International Joint Conference on Artificial Intelligence.

[59]  van Cj Kees Deemter,et al.  Generation of spoken monologues by means of templates , 1994 .

[60]  A.P.J. van den Bosch,et al.  Learning to pronounce written words : a study in inductive language learning , 1997 .

[61]  Daniel S. Paiva,et al.  In search of a reference architecture for NLG systemsLynne , 1999 .

[62]  G. Veldhuijzen van Zanten Adaptive mixed-initiative dialogue management , 1998 .

[63]  Arthur Dirksen Accenting and Deaccenting: a Declarative Approach , 1992, COLING.

[64]  A. Sanderman,et al.  Prosodic phrasing : production, perception, acceptability and comprehension , 1996 .

[65]  Xavier Pouteau,et al.  Robust spoken dialogue systems for consumer products: a concrete application , 1998, ICSLP.

[66]  Rpg Rene Collier Cursus Nederlandse intonatie , 1978 .

[67]  Emiel Krahmer,et al.  Efficient context-sensitive generation of referring expressions , 2002 .

[68]  Wallace L. Chafe,et al.  Language and Consciousness. , 1974 .

[69]  Mark Steedman,et al.  Representing discourse information for spoken dialogue generation , 1996 .

[70]  Emiel Krahmer,et al.  A generic algorithm for generating spoken monologues , 1998, ICSLP.