论文信息 - Optimising selection of units from speech databases for concatenative synthesis

Optimising selection of units from speech databases for concatenative synthesis

Concatenating units of natural speech is one method of speech synthesis. Most such systems use an inventory of xed length units, typically diphones or triphones with one instance of each type. An alternative is to use more varied, non-uniform units extracted from large speech databases containing multiple instances of each. The greater variability in such natural speech segments allows closer modeling of naturalness and di erences in speaking styles, and eliminates the need for specially-recorded, single-use databases. However, with the greater variability comes the problem of how to select between the many instances of units in the database. This paper addresses that issue and presents a general method for unit selection.

Nick Campbell | Alan W. Black | A. Black | N. Campbell

[1] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2] Eric Moulines,et al. Voice transformation using PSOLA technique , 1991, Speech Commun..

[3] Alan W. Black,et al. Prosody and the Selection of Source Units for Concatenative Synthesis , 1997 .

[4] Alan W. Black,et al. CHATR: a generic speech synthesis system , 1994, COLING.

[5] Yoshinori Sagisaka,et al. ATR μ-talk speech synthesis system , 1992, ICSLP.