论文信息 - A new Japanese TTS system based on speech-prosody database and speech modification

A new Japanese TTS system based on speech-prosody database and speech modification

This paper describes a new Japanese text-to-speech (TTS) system that can produce highly natural and intelligible synthetic speech. The good performance of the new TTS system derives from three new sophisticated approaches as follows; (1)A new prosody control algorithm that uses prosody data extracted from a natural speech database and a duration control algorithm based on statistical estimation. (2)A new type of synthesis unit that consists of a consonant with following vowel chain. The unit suppresses unnatural sounds and acoustic discontinuities at concatenation points by preparing synthesis units with various lengths and various F0 contours. (3)A new speech modification algorithm with harmonics reconstruction. To evaluate the new modules and the total performance of the new TTS system, listening tests are carried out. The results confirm that the new modules work together effectively, and that the new TTS system can produce high quality synthesized speech.

[1] Thierry Dutoit,et al. Diphone concatenation using a harmonic plus noise model of speech , 1997, EUROSPEECH.

[2] Takashi Aso,et al. A study on pitch pattern generation using HMM-based statistical information , 1994, ICSLP.

[3] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[4] Hideki Kawahara,et al. Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Masanobu Abe,et al. A Japanese text-to-speech system based on multi-form units with consideration of frequency distribution in Japanese , 1999, EUROSPEECH.

[6] Chikio Hayashi. On the quantification of qualitative data from the mathematico-statistical point of view , 1950 .

[7] Masanobu Abe,et al. A new F0 modification algorithm by manipulating harmonics of magnitude spectrum , 1999, EUROSPEECH.

[8] K. Hakoda,et al. Japanese Text-To-Speech Software based on Wave Form Concatenation Method , 1995 .

[9] Hideyuki Mizuno,et al. A new F0 contour control method based on vector representation of F0 contour , 1999, EUROSPEECH.