论文信息 - Corpus-Based Speech Synthesis

Corpus-Based Speech Synthesis

In this chapter, we present the main trends in corpus-based speech synthesis, assuming a stream of phonemes and prosodic target as input. From the early diphone-based speech synthesizers to the state-of-the art unit-selection-based synthesizers, to the promising statistical parametric techniques, we emphasize the engineering trade-offs that arise when designing such systems.

Thierry Dutoit

[1] Olivier Boëffard,et al. Multilingual PSOLA text-to-speech system , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[3] Silvia Quazza,et al. Choose the best to modify the least: a new generation concatenative synthesis system , 1999, EUROSPEECH.

[4] Alan W. Black,et al. Prosody and the Selection of Source Units for Concatenative Synthesis , 1997 .

[5] Paul Taylor,et al. Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.

[6] Thierry Dutoit,et al. MBR-PSOLA: Text-To-Speech synthesis based on an MBE re-synthesis of the segments database , 1993, Speech Commun..

[7] Bernd Möbius,et al. Rare Events and Closed Domains: Two Delicate Concepts in Speech Synthesis , 2003, Int. J. Speech Technol..

[8] N. Dixon,et al. Terminal analog synthesis of continuous speech using the diphone method of segment assembly , 1968 .

[9] Mehryar Mohri,et al. Rapid unit selection from a large speech corpus for concatenative speech synthesis , 1999, EUROSPEECH.

[10] Mark A. Clements,et al. Speech synthesis based on sinusoidal modeling , 1996 .

[11] Marc C. Beutnagel,et al. The AT & T NEXT-GEN TTS system , 1999 .

[12] Slava Shechtman,et al. Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling , 2005, INTERSPEECH.

[13] Luís B. Almeida,et al. Frequency-varying sinusoidal modeling of speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[14] Heiga Zen,et al. An introduction of trajectory model into HMM-based speech synthesis , 2004, SSW.

[15] Thomas F. Quatieri,et al. Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[16] Jithendra Vepa. Join cost for unit selection speech synthesis , 2004 .

[17] Werner Verhelst,et al. An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18] Michael W. Macon,et al. Control of spectral dynamics in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[19] C. Harris. A Study of the Building Blocks in Speech , 1953 .

[20] Shinya Nakajima. Automatic synthesis unit generation for English speech synthesis based on multi-layered context oriented clustering , 1994, Speech Commun..

[21] Yong Zhao,et al. Custom-tailoring TTS voice font - keeping the naturalness when reducing database size , 2003, INTERSPEECH.

[22] Massimo Giustiniani,et al. A hidden Markov model approach to speech synthesis , 1989, EUROSPEECH.

[23] Yannis Stylianou,et al. Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[24] R V Shannon,et al. Speech Recognition with Primarily Temporal Cues , 1995, Science.

[25] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[26] Alan W. Black,et al. Limited domain synthesis , 2000, INTERSPEECH.

[27] Robert E. Donovan,et al. The IBM trainable speech synthesis system , 1998, ICSLP.

[28] Jan P. H. van Santen,et al. Combinatorial issues in text-to-speech synthesis , 1997, EUROSPEECH.

[29] Walter F. Sendlmeier,et al. Hadifix : a system for German speech synthesis based on demisyllables, diphones and suffixes , 1990, SSW.

[30] Paul Taylor,et al. Speech synthesis by phonological structure matching , 1999, EUROSPEECH.