Speech synthesis, speech simulation and speech science

Speech synthesis research has been transformed in recent years through the exploitation of speech corpora - both for statistical modelling and as a source of signals for concatenative synthesis. This revolution in methodology and the new techniques it brings calls into question the received wisdom that better computer voice output will come from a better understanding of how humans produce speech. This paper discusses the relationship between this new technology of simulated speech and the traditional aims of speech science. The paper suggests that the goal of speech simulation frees engineers from inadequate linguistic and physiological descriptions of speech. But at the same time, it leaves speech scientists free to return to their proper goal of building a computational model of human speech production.

[1]  Robert I. Damper,et al.  A pronunciation-by-analogy module for the Festival Text-to-Speech Synthesiser , 2001, SSW.

[2]  Jan P. H. van Santen,et al.  Assignment of segmental duration in text-to-speech synthesis , 1994, Comput. Speech Lang..

[3]  Chris Mellish,et al.  On the use of automatically generated discourse-level information in a concept-to-speech synthesis system , 1998, ICSLP.

[4]  David G. Stork,et al.  Hal's Legacy: 2001's Computer as Dream and Reality , 1996 .

[5]  Peter Jackson,et al.  A phonologically motivated method of selecting non-uniform units , 1998, ICSLP.

[6]  Ariadna Font Llitjós,et al.  Knowledge of language origin improves pronunciation accuracy of proper names , 2001, INTERSPEECH.

[7]  Mark Huckvale,et al.  ProSynth: an integrated prosodic approach to device-independent, natural-sounding speech synthesis , 1998, Comput. Speech Lang..

[8]  Robert I. Damper,et al.  Prospects for articulatory synthesis: A position paper , 2001, SSW.

[9]  David R. Williams,et al.  Control of a klatt synthesizer by articulatory parameters , 1994, ICSLP.

[10]  Justin Fackrell,et al.  Multilingual prosody modelling using cascades of regression trees and neural networks , 1999, EUROSPEECH.

[11]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[12]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[13]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[14]  Mark Huckvale,et al.  INTONATION MODELLING IN PROSYNTH: AN INTEGRATED PROSODIC APPROACH TO SPEECH SYNTHESIS , 1999 .

[15]  Phillip Taylor,et al.  Concept-to-speech synthesis by phonological structure matching , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.