论文信息 - Diphone concatenation using a harmonic plus noise model of speech

Diphone concatenation using a harmonic plus noise model of speech

In this paper we present a high-quality text-to-speech system using diphones. The system is based on a Harmonic plus Noise (HNM) representation of the speech signal. HNM is a pitch-synchronous analysis-synthesis system but does not require pitch marks to be determined as necessary in PSOLA-based methods. HNM assumes the speech signal to be composed of a periodic part and a stochastic part. As a result, diierent prosody and spectral envelope modiication methods can be applied to each part, yielding more natural-sounding synthetic speech. The fully para-metric representation of speech using HNM also provides a straightforward way of smoothing diphone boundaries. Informal listening tests, using natural prosody, have shown that the synthetic speech quality is close to the quality of the original sentences, without smoothing problems and without buzziness or other oddities observed with other speech representations used for TTS.

Thierry Dutoit | Yannis Stylianou | Juergen Schroeter

[1] Eric Moulines,et al. High-quality speech modification based on a harmonic + noise model , 1995, EUROSPEECH.

[2] Olivier Boëffard,et al. Improving the robustness of text-to-speech synthesizers for large prosodic variations , 1994, SSW.

[3] Eric Moulines,et al. HNS: Speech modification based on a harmonic+noise model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Fabrice Plante,et al. Phase modelling of speech excitation for low bit-rate sinusoidal transform coding , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Thierry Dutoit,et al. On the use of a hybrid harmonic/stochastic model for TTS synthesis-by-concatenation , 1996, Speech Commun..

[6] Mark A. Clements,et al. Speech synthesis based on sinusoidal modeling , 1996 .

[7] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[8] Yannis Stylianou,et al. Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification , 1996 .