论文信息 - Prosodic Modeling in Text-to-Speech Synthesis

Prosodic Modeling in Text-to-Speech Synthesis

This paper discusses three broad obstacles that must be overcome to improve prosodic quality in text-to-speech systems. First, direct and indirect limits set by the signal processing (“synthesis”) components. Second, combinatorial and statistical constraints inherent in generalizing from training corpora to unrestricted domains, and that require the integration of contentspecific knowledge and detailed mathematical modeling. Third, the nature of many empirical research issues that must be solved for prosodic modeling to improve: they are often too focused and model-dependent for academe, and too long-term for development organizations.

Jan P. H. van Santen

[1] Murray F. Spiegel,et al. Using dynamic time warping to formulate duration rules for speech synthesis , 1989 .

[2] Jan P. H. van Santen,et al. Combinatorial issues in text-to-speech synthesis , 1997, EUROSPEECH.

[3] Jan P. H. van Santen,et al. Contextual effects on vowel duration , 1992, Speech Commun..

[4] D. Klatt. Letter: Interaction between two factors that influence vowel duration. , 1973, The Journal of the Acoustical Society of America.

[5] John Kingston,et al. Macro and micro F0 in the synthesis of intonation , 1990 .

[6] D H Klatt,et al. Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[7] Christophe d'Alessandro,et al. Modification of the Aperiodic Component of Speech Signals for Synthesis , 1997 .

[8] J. V. Santen. Exploring N -way tables with sums-of-products models , 1993 .

[9] Christof Traber. F0 generation with a data base of natural F0 patterns and with a neural network , 1990, SSW.

[10] Hiroya Fujisaki,et al. Dynamic Characteristics of Voice Fundamental Frequency in Speech and Singing , 1983 .

[11] David B. Pisoni,et al. Text-to-speech: the mitalk system , 1987 .