论文信息 - Implementation of Simple Spectral Techniques to Enhance the Intelligibility of Speech using a Harmonic Model

Implementation of Simple Spectral Techniques to Enhance the Intelligibility of Speech using a Harmonic Model

We have designed a system that increases the intelligibility of speech signals in noise by manipulating the parameters of a harmonic speech model. The system performs the transformation in two steps: in the first step, it modifies the spectral slope, which is closely related to the vocal effort; in the second step, it amplifies low-energy parts of the signal using dynamic range compression techniques. Objective and subjective measures involving speech-shaped noise confirm the effectiveness of these simple methods. As the harmonic model has been used in previous works to implement the waveform generation module of high-quality statistical synthesizers, the system presented here can provide the synthesis engine with a higher degree of control on the intelligibility of the resulting artificial speech.

Inma Hernáez | Eva Navas | Daniel Erro | Yannis Stylianou

[1] Paavo Alku,et al. Analysis of HMM-Based Lombard Speech Synthesis , 2011, INTERSPEECH.

[2] K. S. Rhebergen,et al. A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. , 2005, The Journal of the Acoustical Society of America.

[3] Inma Hernáez,et al. Improved HNM-Based Vocoder for Statistical Synthesizers , 2011, INTERSPEECH.

[4] Thierry Dutoit,et al. Continuous Control of the Degree of Articulation in HMM-Based Speech Synthesis , 2011, INTERSPEECH.

[5] Thierry Dutoit,et al. Glottal-based analysis of the lombard effect , 2010, INTERSPEECH.

[6] Thomas F. Quatieri,et al. Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[7] Yannis Stylianou,et al. Modeling Speech Based on Harmonic Plus Noise Models , 2004, Summer School on Neural Networks.

[8] Alan W. Black,et al. Improving the understandability of speech synthesis by modeling speech in noise , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9] Valerie HAZAN,et al. 1 CUE-ENHANCEMENT STRATEGIES FOR NATURAL VCV , 2007 .

[10] Simon King,et al. Can Objective Measures Predict the Intelligibility of Modified HMM-Based Synthetic Speech in Noise? , 2011, INTERSPEECH.

[11] Ibon Saratxaga,et al. Emotion Conversion Based on Prosodic Unit Selection , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[12] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] Xavier Rodet,et al. A HMM-based speech synthesis system using a new glottal source and vocal-tract separation method , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14] J C Junqua,et al. The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[15] B. Blesser,et al. Audio dynamic range compression for minimum perceived distortion , 1969 .

[16] Thierry Dutoit,et al. A comparative study of glottal source estimation techniques , 2019, Comput. Speech Lang..

[17] Inma Hernáez,et al. HNM-based MFCC+F0 extractor applied to statistical speech synthesis , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18] Susanto Rahardja,et al. Lombard effect mimicking , 2010, SSW.

[19] Ren-Hua Wang,et al. Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[20] Martin Cooke,et al. The contribution of changes in F0 and spectral tilt to increased intelligibility of speech produced in noise , 2009, Speech Commun..