Text-to-speech synthesis

This paper presents an overview of the problems that occur during text-to-speech (TTS) conversion and describes the particular solutions to these problems taken within the AT&T Bell Laboratories TTS system. In addition to discussing the linguistic and speech analysis issues that must be addressed in a high-quality TTS system, this paper also outlines the modular architecture of the AT&T Bell Laboratories TTS system and the advantages of its modular design.

[1]  Jan P. H. van Santen,et al.  Assignment of segmental duration in text-to-speech synthesis , 1994, Comput. Speech Lang..

[2]  D H Klatt,et al.  Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[3]  Douglas D. OShaughnessy,et al.  Parsing with a Small Dictionary for Applications such as Text to Speech , 1989, Comput. Linguistics.

[4]  Julia Hirschberg,et al.  Pitch Accent in Context: Predicting Intonational Prominence from Text , 1993, Artif. Intell..

[5]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[6]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[7]  Mark Liberman,et al.  Synthesis by rule of english intonation patterns , 1984, ICASSP '84. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[9]  Bernd Möbius,et al.  Ein quantitatives Modell der deutschen Intonation : Analyse und Synthese von Grundfrequenzverläufen , 1993 .

[10]  Chilin Shih,et al.  A Stochastic Finite-State Word-Segmentation Algorithm for Chinese , 1994, ACL.

[11]  D. Ladd The structure of intonational meaning , 1978 .

[12]  A. Liberman,et al.  Acoustic Loci and Transitional Cues for Consonants , 1954 .

[13]  Vincent J. van Heuven,et al.  Analysis and synthesis of speech: strategic research towards high-quality text-to-speech generation , 1993 .

[14]  David Yarowsky,et al.  A corpus-based synthesizer , 1992, ICSLP.

[15]  Ann K. Syrdal Improved duration rules for text‐to‐speech synthesis , 1989 .

[16]  Jan P. H. van Santen Perceptual experiments for diagnostic testing of text-to-speech systems , 1993, Comput. Speech Lang..

[17]  David Yarowsky,et al.  Homograph disambiguation in speech synthesis , 1994, Speech Synthesis Workshop.

[18]  D. Klatt Linguistic uses of segmental duration in English: acoustic and perceptual evidence. , 1976, The Journal of the Acoustical Society of America.

[19]  Eileen Fitzpatrick,et al.  A Computational Grammar of Discourse-Neutral Prosodic Phrasing in English , 1990, Comput. Linguistics.

[20]  Julia Hirschberg,et al.  Automatic classification of intonational phrase boundaries , 1992 .

[21]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[22]  Susan R. Hertz,et al.  Some remarks on Klatt’s ‘‘Review of text‐to‐speech conversion for English’’ [J. Acoust. Soc. Am. 82, 737–793 (1987)] , 1988 .

[23]  Richard Sproat English noun-phrase accent prediction for text-to-speech , 1994, Comput. Speech Lang..

[24]  Michael Riley,et al.  Some Applications of Tree-based Modelling to Speech and Language , 1989, HLT.

[25]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984, ACL.

[26]  Hiroya Fujisaki,et al.  Dynamic Characteristics of Voice Fundamental Frequency in Speech and Singing , 1983 .