High-quality speech synthesis for phonetic speech segmentation

This paper presents an original technique for solving the phonetic segmentation problem. It is based on the use of a speech synthesizer for the alignment of a text on its corresponding speech signal. A high-quality digital speech synthesizer is used to create a synthetic reference speech pattern used in the alignment process. This approach has the great advantage on other approaches that no training stage (hence no labeled database) is needed. The system has been mainly evaluated on French read utterances. Other evaluations have been made on other languages like English, German, Romanian and Spanish. Following these experiments, the system seems to be a powerful tool for the automatic constitution of large phonetically and prosodically labeled speech databases. The availability of such corpora will be a key point for the development of improved speech synthesis and recognition systems.

[1]  Yoshinori Sagisaka,et al.  Computing Prosody, Computational Models for Processing Spontaneous Speech , 2011 .

[2]  Wolfgang Hess,et al.  Generation of multiple synthesis inventories by a bootstrapping procedure , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Bert Van Coile,et al.  PROTRAN: a prosody transplantation tool for text-to-speech applications , 1994, ICSLP.

[4]  Michael Riley Tree-based modelling for speech synthesis , 1990, SSW.

[5]  Thierry Dutoit,et al.  The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Alan W. Black,et al.  Generating F/sub 0/ contours from ToBI labels using linear regression , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Colin W. Wightman,et al.  The aligner: text to speech alignment using Markov models and a pronunciation dictionary , 1994, SSW.

[8]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Nick Campbell,et al.  Optimising selection of units from speech databases for concatenative synthesis , 1995, EUROSPEECH.

[10]  Boulevard Dolez SPEECH SYNTHESIS FOR TEXT-TO-SPEECH ALIGNMENT AND PROSODIC FEATURE EXTRACTION , 1997 .

[11]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[12]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[13]  Matthias Pätzold,et al.  Analysis and synthesis of German F0 contours by means of Fujisaki's model , 1993, Speech Commun..