论文信息 - Text-to-speech synthesis

Text-to-speech synthesis

This paper presents an overview of the problems that occur during text-to-speech (TTS) conversion and describes the particular solutions to these problems taken within the AT&T Bell Laboratories TTS system. In addition to discussing the linguistic and speech analysis issues that must be addressed in a high-quality TTS system, this paper also outlines the modular architecture of the AT&T Bell Laboratories TTS system and the advantages of its modular design.

Joseph P. Olive | Richard W. Sproat | R. Sproat | J. Olive

[1] Jan P. H. van Santen,et al. Assignment of segmental duration in text-to-speech synthesis , 1994, Comput. Speech Lang..

[2] D H Klatt,et al. Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[3] Douglas D. OShaughnessy,et al. Parsing with a Small Dictionary for Applications such as Text to Speech , 1989, Comput. Linguistics.

[4] Julia Hirschberg,et al. Pitch Accent in Context: Predicting Intonational Prominence from Text , 1993, Artif. Intell..

[5] J. Pierrehumbert. The phonology and phonetics of English intonation , 1987 .

[6] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[7] Mark Liberman,et al. Synthesis by rule of english intonation patterns , 1984, ICASSP '84. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] Leo Breiman,et al. Classification and Regression Trees , 1984 .

[9] Bernd Möbius,et al. Ein quantitatives Modell der deutschen Intonation : Analyse und Synthese von Grundfrequenzverläufen , 1993 .

[10] Chilin Shih,et al. A Stochastic Finite-State Word-Segmentation Algorithm for Chinese , 1994, ACL.

[11] D. Ladd. The structure of intonational meaning , 1978 .

[12] A. Liberman,et al. Acoustic Loci and Transitional Cues for Consonants , 1954 .

[13] Vincent J. van Heuven,et al. Analysis and synthesis of speech: strategic research towards high-quality text-to-speech generation , 1993 .

[14] David Yarowsky,et al. A corpus-based synthesizer , 1992, ICSLP.

[15] Ann K. Syrdal. Improved duration rules for text‐to‐speech synthesis , 1989 .

[16] Jan P. H. van Santen. Perceptual experiments for diagnostic testing of text-to-speech systems , 1993, Comput. Speech Lang..

[17] David Yarowsky,et al. Homograph disambiguation in speech synthesis , 1994, Speech Synthesis Workshop.

[18] D. Klatt. Linguistic uses of segmental duration in English: acoustic and perceptual evidence. , 1976, The Journal of the Acoustical Society of America.

[19] Eileen Fitzpatrick,et al. A Computational Grammar of Discourse-Neutral Prosodic Phrasing in English , 1990, Comput. Linguistics.

[20] Julia Hirschberg,et al. Automatic classification of intonational phrase boundaries , 1992 .

[21] Kenneth Ward Church. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[22] Susan R. Hertz,et al. Some remarks on Klatt’s ‘‘Review of text‐to‐speech conversion for English’’ [J. Acoust. Soc. Am. 82, 737–793 (1987)] , 1988 .

[23] Richard Sproat. English noun-phrase accent prediction for text-to-speech , 1994, Comput. Speech Lang..

[24] Michael Riley,et al. Some Applications of Tree-based Modelling to Speech and Language , 1989, HLT.

[25] Kimmo Koskenniemi,et al. A General Computational Model for Word-Form Recognition and Production , 1984, ACL.

[26] Hiroya Fujisaki,et al. Dynamic Characteristics of Voice Fundamental Frequency in Speech and Singing , 1983 .