Implementation of Simple Spectral Techniques to Enhance the Intelligibility of Speech using a Harmonic Model

We have designed a system that increases the intelligibility of speech signals in noise by manipulating the parameters of a harmonic speech model. The system performs the transformation in two steps: in the first step, it modifies the spectral slope, which is closely related to the vocal effort; in the second step, it amplifies low-energy parts of the signal using dynamic range compression techniques. Objective and subjective measures involving speech-shaped noise confirm the effectiveness of these simple methods. As the harmonic model has been used in previous works to implement the waveform generation module of high-quality statistical synthesizers, the system presented here can provide the synthesis engine with a higher degree of control on the intelligibility of the resulting artificial speech.

[1]  Paavo Alku,et al.  Analysis of HMM-Based Lombard Speech Synthesis , 2011, INTERSPEECH.

[2]  K. S. Rhebergen,et al.  A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. , 2005, The Journal of the Acoustical Society of America.

[3]  Inma Hernáez,et al.  Improved HNM-Based Vocoder for Statistical Synthesizers , 2011, INTERSPEECH.

[4]  Thierry Dutoit,et al.  Continuous Control of the Degree of Articulation in HMM-Based Speech Synthesis , 2011, INTERSPEECH.

[5]  Thierry Dutoit,et al.  Glottal-based analysis of the lombard effect , 2010, INTERSPEECH.

[6]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[7]  Yannis Stylianou,et al.  Modeling Speech Based on Harmonic Plus Noise Models , 2004, Summer School on Neural Networks.

[8]  Alan W. Black,et al.  Improving the understandability of speech synthesis by modeling speech in noise , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  Valerie HAZAN,et al.  1 CUE-ENHANCEMENT STRATEGIES FOR NATURAL VCV , 2007 .

[10]  Simon King,et al.  Can Objective Measures Predict the Intelligibility of Modified HMM-Based Synthetic Speech in Noise? , 2011, INTERSPEECH.

[11]  Ibon Saratxaga,et al.  Emotion Conversion Based on Prosodic Unit Selection , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Xavier Rodet,et al.  A HMM-based speech synthesis system using a new glottal source and vocal-tract separation method , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[15]  B. Blesser,et al.  Audio dynamic range compression for minimum perceived distortion , 1969 .

[16]  Thierry Dutoit,et al.  A comparative study of glottal source estimation techniques , 2019, Comput. Speech Lang..

[17]  Inma Hernáez,et al.  HNM-based MFCC+F0 extractor applied to statistical speech synthesis , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Susanto Rahardja,et al.  Lombard effect mimicking , 2010, SSW.

[19]  Ren-Hua Wang,et al.  Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Martin Cooke,et al.  The contribution of changes in F0 and spectral tilt to increased intelligibility of speech produced in noise , 2009, Speech Commun..