Perceptually-based data-driven join costs: comparing join types

Unit selection synthesis has improved the quality of synthetic speech by making it possible to concatenate speech from a large database to produce intelligible synthesis while preserving much of the naturalness of the original signal. Such synthesis is by no means perfect, however, and this paper describes work to achieve more optimal joins between concatenated units. Results from a psychoacoustic experiment, acoustic parameters and phonetic factors are analyzed and used in statistical training of join costs so that audible discontinuities at concatenation boundaries can be minimized.

[1]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  F. Park ROBUST UNIT SELECTION SYSTEM FOR SPEECH SYNTHESIS , 1999 .

[3]  J. Swets Signal detection and recognition by human observers : contemporary readings , 1964 .

[4]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[5]  Michael W. Macon,et al.  A perceptual evaluation of distance measures for concatenative speech synthesis , 1998, ICSLP.

[6]  A. Wilgus,et al.  High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Robert E. Donovan,et al.  A new distance measure for costing spectral discontinuities in concatenative speech synthesizers , 2001, SSW.

[8]  Ann K. Syrdal,et al.  Data-driven perceptually based join costs , 2004, SSW.

[9]  Eam Esther Klabbers,et al.  Segmental and prosodic improvements to speech generation , 2000 .

[10]  Raymond N. J. Veldhuis,et al.  On the reduction of concatenation artefacts in diphone synthesis , 1998, ICSLP.

[11]  Ann K. Syrdal Phonetic effects on listener detection of vowel concatenation , 2001, INTERSPEECH.

[12]  Raymond N. J. Veldhuis,et al.  Reducing audible spectral discontinuities , 2001, IEEE Trans. Speech Audio Process..

[13]  S. R. Searle Linear Models , 1971 .

[14]  Yannis Stylianou,et al.  Perceptual and objective detection of discontinuities in concatenative speech synthesis , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).