Unscented Transform-Based Dual-Channel Noise Estimation: Application to Speech Enhancement on Smartphones

Some speech processing approaches rely on a noise estimation stage and their performance depends on the accuracy of the noise estimates. In this paper we propose a novel minimum mean square error (MMSE) noise estimator that takes advantage of dual-channel noisy observations to avoid the use of a clean speech model while keeping a simple formulation. The parameters of this estimator are obtained through the unscented transform (UT), which is able to compute better quality statistics in a more efficient way than through classical vector Taylor series (VTS) linearization. For evaluation, speech enhancement on a dual-microphone smartphone in close-talk conditions is considered, which is a particular application of interest. Results show the superiority of our proposal with respect to other single-and dual-channel noise estimation methods in terms of different measures such as estimation accuracy as well as on the enhanced speech signal, i.e., speech quality and intelligibility.

[1]  Ning Ma,et al.  MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Yifan Gong,et al.  An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Thambipillai Srikanthan,et al.  Psychoacoustic Model Compensation for Robust Speaker Verification in Environmental Noise , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Hugo Van hamme,et al.  Model-based feature enhancement with uncertainty decoding for noise robust ASR , 2006, Speech Commun..

[5]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[6]  Ángel M. Gómez,et al.  Dual-channel spectral weighting for robust speech recognition in mobile devices , 2018, Digit. Signal Process..

[7]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[8]  Yifan Gong,et al.  Unscented transform with online distortion estimation for HMM adaptation , 2010, INTERSPEECH.

[9]  Ángel M. Gómez,et al.  Feature enhancement for robust speech recognition on smartphones with dual-microphone , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[10]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[11]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[12]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[13]  Jesper Jensen,et al.  MMSE based noise PSD tracking with low complexity , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Jeffrey K. Uhlmann,et al.  Unscented filtering and nonlinear estimation , 2004, Proceedings of the IEEE.

[15]  Christophe Beaugeant,et al.  Noise reduction for dual-microphone mobile phones exploiting power level differences , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).