论文信息 - High-quality text-to-speech synthesis : an overview

High-quality text-to-speech synthesis : an overview

This paper tries to give a comprehensive introduction to state-of-the-art Text-ToSpeech (TTS) synthesis by highlighting its Digital Signal Processing (DSP) and Natural Language Processing (NLP) components. As a matter of fact, since very few people associate a good knowledge of DSP with a comprehensive insight into NLP, synthesis mostly remains unclear, even for people working in either research area. After a brief definition of a general TTS system and of its commercial applications, in Section 1, the paper is basically divided into two parts. Section 2.1 begins with a presentation of the many practical NLP problems which have to be solved by a TTS system. We then examine, in Section 2.2, how synthetic speech can be obtained by simply concatenating elementary speech units, and what choices have to be made for this operation to yield high quality. We finaly give a word on existing TTS solutions, with special emphasis on the computational and economical constraints which have to be kept in mind when designing TTS systems.

Thierry Dutoit | T. Dutoit

[1] Thierry Dutoit,et al. MBR-PSOLA: Text-To-Speech synthesis based on an MBE re-synthesis of the segments database , 1993, Speech Commun..

[2] Sheri Hunnicutt,et al. A multi-language text-to-speech module , 1982, ICASSP.

[3] J. Allen. A perspective on man-machine communication by speech , 1985, Proceedings of the IEEE.

[4] Dennis H. Klatt,et al. Software for a cascade/parallel formant synthesizer , 1980 .

[5] Louis-Jean Boë,et al. From lexicon to rules: toward a descriptive method of French text-to-phonetics transcription , 1992, ICSLP.

[6] David Yarowsky,et al. Homograph disambiguation in speech synthesis , 1994, Speech Synthesis Workshop.

[7] Walter Daelemans,et al. Tabtalk: reusability in data-oriented grapheme-to-phoneme conversion , 1993, EUROSPEECH.

[8] Eileen Fitzpatrick,et al. A Computational Grammar of Discourse-Neutral Prosodic Phrasing in English , 1990, Comput. Linguistics.

[9] Leon Gulikers,et al. Word class assignment in a text-to-speech system , 1992, ICSLP.

[10] James A. Anderson,et al. Syntactic category disambiguation with neural networks , 1989 .

[11] Isabel Trancoso,et al. Hybrid sinusoidal modeling of speech without voicing decision , 1991, EUROSPEECH.