Spectral tilt modelling with GMMs for intelligibility enhancement of narrowband telephone speech

In mobile communications, post-processing methods are used to improve the intelligibility of speech in adverse background noise conditions. In this study, post-processing based on modelling the Lombard effect is investigated. The study focuses on comparing different spectral envelope estimation methods together with Gaussian mixture modelling in order to change the spectral tilt of speech in a post-processing algorithm. Six spectral envelope estimation methods are compared using objective distortion measures as well as subjective word-error rate and quality tests in different near-end noise conditions. Results show that one of the envelope estimation methods, stabilised weighted linear prediction, yielded statistically significant improvement in intelligibility over unprocessed speech.

[1]  Ronald E. Crochiere,et al.  A study of complexity and quality of speech waveform coders , 1978, ICASSP.

[2]  Yannis Stylianou,et al.  Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression , 2012, INTERSPEECH.

[3]  Martti Vainio,et al.  Developing a speech intelligibility test based on measuring speech reception thresholds in noise for English and Finnish. , 2005, The Journal of the Acoustical Society of America.

[4]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Yannis Stylianou,et al.  Approaching speech intelligibility enhancement with inspiration from Lombard and Clear speaking styles , 2014, Comput. Speech Lang..

[6]  Paavo Alku,et al.  Stabilised weighted linear prediction , 2009, Speech Commun..

[7]  J. L. Hall,et al.  Intelligibility and listener preference of telephone speech in the presence of babble noise. , 2010, The Journal of the Acoustical Society of America.

[8]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[9]  Paavo Alku,et al.  Signal-to-noise ratio adaptive post-filtering method for intelligibility enhancement of telephone speech. , 2012, The Journal of the Acoustical Society of America.

[10]  Cécile Proust-Lima,et al.  Robustness of the linear mixed model to misspecified error distribution , 2007, Comput. Stat. Data Anal..

[11]  Martin Cooke,et al.  The contribution of changes in F0 and spectral tilt to increased intelligibility of speech produced in noise , 2009, Speech Commun..

[12]  Paavo Alku,et al.  Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise , 2014, Comput. Speech Lang..

[13]  Paavo Alku,et al.  An adaptive post-filtering method producing an artificial Lombard-like effect for intelligibility enhancement of narrowband telephone speech , 2014, Comput. Speech Lang..

[14]  Yan Tang,et al.  Optimised spectral weightings for noise-dependent speech intelligibility enhancement , 2012, INTERSPEECH.

[15]  Yves Kamp,et al.  Robust signal selection for linear prediction analysis of voiced speech , 1993, Speech Commun..

[16]  Paavo Alku,et al.  Comparison of post-filtering methods for intelligibility enhancement of telephone speech , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[17]  Peter Jax,et al.  On artificial bandwidth extension of telephone speech , 2003, Signal Process..

[18]  John G. Harris,et al.  Applied principles of clear and Lombard speech for automated intelligibility enhancement in noisy environments , 2006, Speech Commun..

[19]  Paavo Alku,et al.  Neural Network-Based Artificial Bandwidth Expansion of Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Martin Cooke,et al.  Spectral and temporal changes to speech produced in the presence of energetic and informational maskers. , 2010, The Journal of the Acoustical Society of America.

[21]  Masanobu Abe,et al.  Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt , 1995, Speech Commun..

[22]  R. H. Bernacki,et al.  Effects of noise on speech production: acoustic and perceptual analyses. , 1988, The Journal of the Acoustical Society of America.

[23]  Simon King,et al.  Combining perceptually-motivated spectral shaping with loudness and duration modification for intelligibility enhancement of HMM-based synthetic speech in noise , 2013, INTERSPEECH.

[24]  R. Niederjohn,et al.  The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression , 1976 .

[25]  Peter Vary,et al.  Recursive Closed-Form Optimization of Spectral Audio Power Allocation for Near End Listening Enhancement , 2010, Sprachkommunikation.

[26]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[27]  Paavo Alku,et al.  Effect of noise type and level on focus related fundamental frequency changes , 2012, INTERSPEECH.

[28]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[29]  Jan Rennies,et al.  Improving speech intelligibility in noise by SII-dependent preprocessing using frequency-dependent amplification and dynamic range compression , 2013, INTERSPEECH.

[30]  John Makhoul,et al.  Spectral linear prediction: Properties and applications , 1975 .

[31]  A. Gray,et al.  Distance measures for speech processing , 1976 .

[32]  Jesper Jensen,et al.  On Optimal Linear Filtering of Speech for Near-End Listening Enhancement , 2013, IEEE Signal Processing Letters.