UNIT SELECTION SPEECH SYNTHESIS IN NOISE

The paper presents an approach to unit selection speech synthesis in noise. The approach is based on a modification of the speech synthesis method originally published in [1], where the distance of a candidate unit from its cluster center is used as the unit selection cost. We found out that using an additional measure evaluating intelligibility for the unit cost may improve the overall understandability of speech in noise. The measure we have chosen for prediction of speech intelligibility in noise is Speech Intelligibility Index (SII). While the calculation of the SII value for each unit in the speech corpus was made off-line, a pink noise was used as a representative noise for the calculation. Listening tests imply that such a simple modification of the unit cost in unit selection synthesis can improve understandability of speech delivered under poor channel conditions.

[1]  Horabail S Venkatagiri Segmental intelligibility of four currently used text-to-speech synthesis methods. , 2003, The Journal of the Acoustical Society of America.

[2]  K. S. Rhebergen,et al.  A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. , 2005, The Journal of the Acoustical Society of America.

[3]  Sarah Hawkins,et al.  Assessment of naturalness in the ProSynth speech synthesis project , 2000 .

[4]  Paul Taylor,et al.  Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.

[5]  Alan W. Black,et al.  Creating a database of speech in noise for unit selection synthesis , 2004, SSW.

[6]  Thierry Dutoit,et al.  TTSBOX: a MATLAB toolbox for teaching text-to-speech synthesis , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Alan W. Black,et al.  Improving the understandability of speech synthesis by modeling speech in noise , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..