A Priori SNR Computation for Speech Enhancement Based on Cepstral Envelope Estimation

In this contribution we present our latest investigations and analysis on a novel a priori SNR estimator for speech enhancement applications. It is based on a clean spectral envelope estimation with a deep neural network (DNN) in the cepstral domain. Furthermore, by integrating our cepstral excitation manipulation (CEM) approach into this framework, we obtain not only a smooth and natural background noise experience, but also achieve noise reduction between harmonics which is not possible with low-order models. We investigate the performance of the proposed approach in conjunction with three different spectral weighting rules and show improvement of more than 3.5 dB noise attenuation vs. the well-known decision-directed (DD) approach without a significant trade-off in speech distortion.

[1]  Wouter Tirry,et al.  DNN-Supported Speech Enhancement With Cepstral Estimation of Both Excitation and Envelope , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Hong-Goo Kang,et al.  A Priori SNR Estimation Using Air- and Bone-Conduction Microphones , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Huajun Yu,et al.  Post-Filter Optimization for Multichannel Automotive Speech Enhancement , 2013 .

[5]  Israel Cohen,et al.  Relaxed statistical model for speech enhancement and a priori SNR estimation , 2005, IEEE Transactions on Speech and Audio Processing.

[6]  Pejman Mowlaee Begzade Mahale,et al.  A Simple and Effective Framework for a Priori SNR Estimation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[8]  Sridha Sridharan,et al.  The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms , 2010, INTERSPEECH.

[9]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[10]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[11]  Pascal Scalart,et al.  Author manuscript, published in "IEEE Transactions on Audio, Speech, and Language Processing (2006)" 1 Improved Signal-to-Noise Ratio Estimation for Speech Enhancement , 2010 .

[12]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[13]  Tim Fingscheidt,et al.  A Data-Driven Approach to A Priori SNR Estimation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Rainer Martin,et al.  A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Wouter Tirry,et al.  Instantaneous A Priori SNR Estimation by Cepstral Excitation Manipulation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[17]  Panos E. Papamichalis,et al.  Practical approaches to speech coding , 1987 .

[18]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[19]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[20]  Wouter Tirry,et al.  An iterative speech model-based a priori SNR estimator , 2015, INTERSPEECH.

[21]  Reinhold Häb-Umbach,et al.  A Priori SNR Estimation Using Weibull Mixture Model , 2016, ITG Symposium on Speech Communication.

[22]  Tim Fingscheidt,et al.  Environment-Optimized Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.