SPEECH SYNTHESIS FOR TEXT-TO-SPEECH ALIGNMENT AND PROSODIC FEATURE EXTRACTION

The aim of this paper is to present a new and promising approach of the text-to-speech alignment problem. For thi:j purpose, an original idea is developed : a high quality digital speech synthesizer is used to create a reference speech pattern used during the alignment process. The system has been used and tested to extract the prosodic [eatures 01 read French utterances. The results show a segmentation error rate of about 8%. This system will be ;I powerl'ul tool for the automatic creation of large prosodically labeled databases and for research on automatic prosody generation.

[1]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[2]  Bert Van Coile,et al.  PROTRAN: a prosody transplantation tool for text-to-speech applications , 1994, ICSLP.

[3]  Matthias Pätzold,et al.  Analysis and synthesis of German F0 contours by means of Fujisaki's model , 1993, Speech Commun..

[4]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[5]  Thierry Dutoit,et al.  The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Frank K. Soong,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[7]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[8]  Julia Hirschberg Using text analysis to predict intonational boundaries , 1991, EUROSPEECH.