Spectral tilt modelling with extrapolated GMMs for intelligibility enhancement of narrowband telephone speech

Post-processing methods are used in mobile communications to improve the intelligibility of speech in adverse background noise conditions. In this study, post-processing based on the modification of the spectral tilt with Gaussian mixture models according to the Lombard effect is investigated. A spectral envelope estimation method is studied and optimized for this purpose. Furthermore, the extrapolation of the statistical mapping in a post-processing context is investigated. The proposed post-processing methods are compared to unprocessed speech and a reference method in subjective intelligibility and quality tests in different near-end noise conditions. The results indicate that one of the extrapolated methods achieved the same intelligibility as fixed high-pass filtering without degrading the quality of speech.

[1]  Peter Vary,et al.  Recursive Closed-Form Optimization of Spectral Audio Power Allocation for Near End Listening Enhancement , 2010, Sprachkommunikation.

[2]  R. H. Bernacki,et al.  Effects of noise on speech production: acoustic and perceptual analyses. , 1988, The Journal of the Acoustical Society of America.

[3]  Simon King,et al.  Combining perceptually-motivated spectral shaping with loudness and duration modification for intelligibility enhancement of HMM-based synthetic speech in noise , 2013, INTERSPEECH.

[4]  Peter Jax,et al.  On artificial bandwidth extension of telephone speech , 2003, Signal Process..

[5]  Paavo Alku,et al.  Signal-to-noise ratio adaptive post-filtering method for intelligibility enhancement of telephone speech. , 2012, The Journal of the Acoustical Society of America.

[6]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[7]  Paavo Alku,et al.  Effect of noise type and level on focus related fundamental frequency changes , 2012, INTERSPEECH.

[8]  Paavo Alku,et al.  An adaptive post-filtering method producing an artificial Lombard-like effect for intelligibility enhancement of narrowband telephone speech , 2014, Comput. Speech Lang..

[9]  Paavo Alku,et al.  Spectral tilt modelling with GMMs for intelligibility enhancement of narrowband telephone speech , 2014, INTERSPEECH.

[10]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[11]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[12]  A. Gray,et al.  Distance measures for speech processing , 1976 .

[13]  Jesper Jensen,et al.  On Optimal Linear Filtering of Speech for Near-End Listening Enhancement , 2013, IEEE Signal Processing Letters.

[14]  Cécile Proust-Lima,et al.  Robustness of the linear mixed model to misspecified error distribution , 2007, Comput. Stat. Data Anal..

[15]  Yannis Stylianou,et al.  Approaching speech intelligibility enhancement with inspiration from Lombard and Clear speaking styles , 2014, Comput. Speech Lang..

[16]  Yannis Stylianou,et al.  Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression , 2012, INTERSPEECH.

[17]  Martin Cooke,et al.  Spectral and temporal changes to speech produced in the presence of energetic and informational maskers. , 2010, The Journal of the Acoustical Society of America.

[18]  Martti Vainio,et al.  Developing a speech intelligibility test based on measuring speech reception thresholds in noise for English and Finnish. , 2005, The Journal of the Acoustical Society of America.

[19]  Paavo Alku,et al.  Stabilised weighted linear prediction , 2009, Speech Commun..

[20]  Martin Cooke,et al.  The contribution of changes in F0 and spectral tilt to increased intelligibility of speech produced in noise , 2009, Speech Commun..

[21]  J. L. Hall,et al.  Intelligibility and listener preference of telephone speech in the presence of babble noise. , 2010, The Journal of the Acoustical Society of America.

[22]  Yves Kamp,et al.  Robust signal selection for linear prediction analysis of voiced speech , 1993, Speech Commun..