Corpus-Based Speech Synthesis

In this chapter, we present the main trends in corpus-based speech synthesis, assuming a stream of phonemes and prosodic target as input. From the early diphone-based speech synthesizers to the state-of-the art unit-selection-based synthesizers, to the promising statistical parametric techniques, we emphasize the engineering trade-offs that arise when designing such systems.

[1]  Olivier Boëffard,et al.  Multilingual PSOLA text-to-speech system , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[3]  Silvia Quazza,et al.  Choose the best to modify the least: a new generation concatenative synthesis system , 1999, EUROSPEECH.

[4]  Alan W. Black,et al.  Prosody and the Selection of Source Units for Concatenative Synthesis , 1997 .

[5]  Paul Taylor,et al.  Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.

[6]  Thierry Dutoit,et al.  MBR-PSOLA: Text-To-Speech synthesis based on an MBE re-synthesis of the segments database , 1993, Speech Commun..

[7]  Bernd Möbius,et al.  Rare Events and Closed Domains: Two Delicate Concepts in Speech Synthesis , 2003, Int. J. Speech Technol..

[8]  N. Dixon,et al.  Terminal analog synthesis of continuous speech using the diphone method of segment assembly , 1968 .

[9]  Mehryar Mohri,et al.  Rapid unit selection from a large speech corpus for concatenative speech synthesis , 1999, EUROSPEECH.

[10]  Mark A. Clements,et al.  Speech synthesis based on sinusoidal modeling , 1996 .

[11]  Marc C. Beutnagel,et al.  The AT & T NEXT-GEN TTS system , 1999 .

[12]  Slava Shechtman,et al.  Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling , 2005, INTERSPEECH.

[13]  Luís B. Almeida,et al.  Frequency-varying sinusoidal modeling of speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[14]  Heiga Zen,et al.  An introduction of trajectory model into HMM-based speech synthesis , 2004, SSW.

[15]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[16]  Jithendra Vepa Join cost for unit selection speech synthesis , 2004 .

[17]  Werner Verhelst,et al.  An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Michael W. Macon,et al.  Control of spectral dynamics in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[19]  C. Harris A Study of the Building Blocks in Speech , 1953 .

[20]  Shinya Nakajima Automatic synthesis unit generation for English speech synthesis based on multi-layered context oriented clustering , 1994, Speech Commun..

[21]  Yong Zhao,et al.  Custom-tailoring TTS voice font - keeping the naturalness when reducing database size , 2003, INTERSPEECH.

[22]  Massimo Giustiniani,et al.  A hidden Markov model approach to speech synthesis , 1989, EUROSPEECH.

[23]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[24]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[25]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[26]  Alan W. Black,et al.  Limited domain synthesis , 2000, INTERSPEECH.

[27]  Robert E. Donovan,et al.  The IBM trainable speech synthesis system , 1998, ICSLP.

[28]  Jan P. H. van Santen,et al.  Combinatorial issues in text-to-speech synthesis , 1997, EUROSPEECH.

[29]  Walter F. Sendlmeier,et al.  Hadifix : a system for German speech synthesis based on demisyllables, diphones and suffixes , 1990, SSW.

[30]  Paul Taylor,et al.  Speech synthesis by phonological structure matching , 1999, EUROSPEECH.