Formant filters-based multi-band speech enhancement algorithm for intelligibility improvement

Speech enhancement algorithms in the past concentrated on improving the speech quality, however they need not necessarily improve intelligibility of the enhanced speech. The current work focuses on improving the quality as well as intelligibility of the well-known multi-band spectral subtraction algorithm. In this regard, to improve speech quality, a temporal-domain filtering-based approach is proposed to obtain sub-bands (ERB-based). To improve intelligibility, it is necessary to identify the type of distortion (attenuation or amplification distortion) that affects the intelligibility of enhanced speech. Therefore, an analysis is performed on the enhanced speech at the phoneme level using segmental-SNR and it is observed that in high SNR regions of the noisy speech (specifically in vowels, liquids, nasals), intelligibility is reduced due to amplification distortion. This may be due to the high spectral resolution of the temporal-domain ERB-based filters. Hence, to improve intelligibility, a set of formant specific filters are proposed based on the formant analysis carried out over vowels, liquids and nasals. The performance of the proposed multi-band spectral subtraction algorithm is evaluated for its quality and intelligibility, using subjective (MOS) and objective (PESQ and CSII) measures, for the speech affected by white, car and babble noise at -5 to 15 dB SNR levels. It is observed that the proposed method improves speech quality and intelligibility by around 0.1-0.5 in terms of PESQ and 2-10% in terms of CSII over conventional multi-band spectral subtraction method.

[1]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[2]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[3]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[5]  Antony William Rix,et al.  Perceptual evaluation of speech quality (PESQ): The new ITU standard for end-to-end speech quality a , 2002 .

[6]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[7]  Stephen T. Neely,et al.  Signals, Sound, and Sensation , 1997 .

[8]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[9]  T. Nagarajan,et al.  Temporal-domain filtering approach for multiband speech enhancement , 2015, 2015 International Conference on Microwave, Optical and Communication Engineering (ICMOCE).

[10]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Philipos C. Loizou,et al.  A multi-band spectral subtraction method for enhancing speech corrupted by colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.