Speech synthesis by phonological structure matching

This paper presents a new technique for speech synthesis by unit selection. The technique works by specifying the synthesis target and the speech database as phonological trees, and using a selection algorithm which finds the largest parts of trees in the database which match parts of the target tree. The technique avoids many of the errors made by prosody generation modules by incorporating their operation in the selection implicitly. A technique for using signal processing only when it is needed most is also described. The technique produces better quality speech than previous approaches and is also significantly faster.

[1]  Peter Jackson,et al.  A phonologically motivated method of selecting non-uniform units , 1998, ICSLP.

[2]  Michael W. Macon,et al.  Optimized stopping criteria for tree-based unit selection in concatenative synthesis , 1998, ICSLP.

[3]  Richard Sproat,et al.  Multilingual Text-to-Speech Synthesis: The Bell Labs Approach , 1998, CL.

[4]  Yoshinori Sagisaka,et al.  ATR μ-talk speech synthesis system , 1992, ICSLP.

[5]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Chris Mellish,et al.  Dynamic Generation of Museum Web Pages: The Intelligent Labelling Explorer , 1997, Arch. Mus. Informatics.

[7]  Alistair Conkie A robust unit selection system for speech synthesis , 1999 .

[8]  Philip C. Woodland,et al.  Improvements in an HMM-based speech synthesiser , 1995, EUROSPEECH.

[9]  James R. Glass,et al.  Natural-sounding speech synthesis using variable-length units , 1998, ICSLP.

[10]  Victor Zue,et al.  From interface to content: translingual access and delivery of on-line information , 1997, EUROSPEECH.

[11]  Alan W. Black,et al.  Prosody and the Selection of Source Units for Concatenative Synthesis , 1997 .

[12]  Paul Taylor,et al.  Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.

[13]  David B. Pisoni,et al.  Text-to-speech: the mitalk system , 1987 .