Perfect synthesis for all of the people all of the time

The quality of speech synthesis has drastically improved over the last ten years. Or at least it appears that this is the case. We have moved from diphones to unit selection. However, although we can produce much more natural sounding examples we have also given up an certain amount of control over what can be synthesized. We have reached the stage where playing a few examples to a non-expert can easily convince them that speech synthesis is a solved problem. This paper looks at how we might not only convince some of the people some of the time, but what we must do to produce perfect synthesis for all of the people all of the time.

[1]  Bernd Möbius,et al.  Rare Events and Closed Domains: Two Delicate Concepts in Speech Synthesis , 2003, Int. J. Speech Technol..

[2]  Walter F. Sendlmeier,et al.  Hadifix : a system for German speech synthesis based on demisyllables, diphones and suffixes , 1990, SSW.

[3]  Jan P. H. van Santen,et al.  Methods for optimal text selection , 1997, EUROSPEECH.

[4]  Paul Taylor,et al.  Speech synthesis by phonological structure matching , 1999, EUROSPEECH.

[5]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  T. V. Raman Auditory User Interfaces: Toward the Speaking Computer , 1997 .

[7]  Michael W. Macon,et al.  Spectral modification for concatenative speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9]  Alan W. Black,et al.  Limited domain synthesis , 2000, INTERSPEECH.

[10]  Marc C. Beutnagel,et al.  The AT & T NEXT-GEN TTS system , 1999 .

[11]  Alexander I. Rudnicky,et al.  Task and domain specific modelling in the Carnegie Mellon communicator system , 2000, INTERSPEECH.

[12]  Philip C. Woodland,et al.  Improvements in an HMM-based speech synthesiser , 1995, EUROSPEECH.

[13]  Alan W. Black,et al.  Optimal data selection for unit selection synthesis , 2001, SSW.

[14]  N. Iwahashi,et al.  Speech Segment Selection for Concatenative Synthesis Based on Spectral Distortion Minimization , 1993 .

[15]  Alan W. Black,et al.  Prosody and the Selection of Source Units for Concatenative Synthesis , 1997 .