Improved Empirical Mode Decomposition Using Optimal Recursive Averaging Noise Estimation for Speech Enhancement

In this paper, a new approach for robust speech enhancement based on improved ensemble empirical mode decomposition (EMD) using optimized log-spectral amplitude noise estimation is presented. In this approach, a noisy signal is decomposed adaptively into a sum of oscillating components that belong to intrinsic mode functions (IMFs); then, each component is enhanced separately to provide less-corrupted IMFs that are used by the Hurst exponent method to construct an estimate of a clean signal. This new framework takes advantage of adaptive noise estimation performed by improved minima-controlled recursive averaging for noise estimation and optimally modified log-spectral amplitude to enhance the noisy EMD components. Through experimental evidence, the objective evaluation of quality and intelligibility demonstrates that the proposed method performs significantly better than the baseline techniques, including the most recently developed EMD-based speech enhancement methods.

[1]  Hongyi Li,et al.  An improved EEMD method based on the adjustable cubic trigonometric cardinal spline interpolation , 2017, Digit. Signal Process..

[2]  Yi Hu,et al.  Evaluation of objective measures for speech enhancement , 2006, INTERSPEECH.

[3]  Bittu Kumar,et al.  Comparative Performance Evaluation of Greedy Algorithms for Speech Enhancement System , 2020 .

[4]  John J. Soraghan,et al.  EMD-Based Filtering (EMDF) of Low-Frequency Noise for Speech Enhancement , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Rajib Sharma,et al.  Empirical Mode Decomposition for adaptive AM-FM analysis of Speech: A Review , 2017, Speech Commun..

[7]  Dominique Zosso,et al.  Variational Mode Decomposition , 2014, IEEE Transactions on Signal Processing.

[8]  Jinwon Lee,et al.  A Fully Convolutional Neural Network for Speech Enhancement , 2016, INTERSPEECH.

[9]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[10]  Antonio Bonafonte,et al.  SEGAN: Speech Enhancement Generative Adversarial Network , 2017, INTERSPEECH.

[11]  Zhiwei Wang,et al.  An improved complementary ensemble empirical mode decomposition with adaptive noise and its application to rolling element bearing fault diagnosis. , 2019, ISA transactions.

[12]  V. Kamakshi Prasad,et al.  Speaker Identification Using Empirical Mode Decomposition-Based Voice Activity Detection Algorithm under Realistic Conditions , 2014, J. Intell. Syst..

[13]  Xuerong Ye,et al.  An Improved Empirical Mode Decomposition Based on Adaptive Weighted Rational Quartic Spline for Rolling Bearing Fault Diagnosis , 2020, IEEE Access.

[14]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[15]  Yu Tsao,et al.  SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement , 2016, INTERSPEECH.

[16]  Yu Tsao,et al.  End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Haiyang Pan,et al.  Mean-optimized mode decomposition: An improved EMD approach for non-stationary signal processing. , 2020, ISA transactions.

[18]  Norden E. Huang,et al.  Complementary Ensemble Empirical Mode Decomposition: a Novel Noise Enhanced Data Analysis Method , 2010, Adv. Data Sci. Adapt. Anal..

[19]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[20]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Norden E. Huang,et al.  Ensemble Empirical Mode Decomposition: a Noise-Assisted Data Analysis Method , 2009, Adv. Data Sci. Adapt. Anal..

[22]  I. Cohen Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator , 2002, IEEE Signal Processing Letters.

[23]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[24]  Yi Hu,et al.  A generalized subspace approach for enhancing speech corrupted by colored noise , 2003, IEEE Trans. Speech Audio Process..

[25]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[26]  Leonardo Zao,et al.  Speech Enhancement with EMD and Hurst-Based Mode Selection , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  María Eugenia Torres,et al.  Improved complete ensemble EMD: A suitable tool for biomedical signal processing , 2014, Biomed. Signal Process. Control..

[28]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[29]  Björn W. Schuller,et al.  Single-channel speech separation with memory-enhanced recurrent neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Di Zhao,et al.  A Preconditioning Framework for the Empirical Mode Decomposition Method , 2018, Circuits Syst. Signal Process..

[31]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[32]  Rajib Sharma,et al.  A better decomposition of speech obtained using modified Empirical Mode Decomposition , 2016, Digit. Signal Process..

[33]  Patrick Flandrin,et al.  A complete ensemble empirical mode decomposition with adaptive noise , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[35]  John R. Hershey,et al.  Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks , 2015, INTERSPEECH.

[36]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[37]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[38]  Gabriel Rilling,et al.  Detrending and denoising with empirical mode decompositions , 2004, 2004 12th European Signal Processing Conference.

[39]  Patrice Abry,et al.  A Wavelet-Based Joint Estimator of the Parameters of Long-Range Dependence , 1999, IEEE Trans. Inf. Theory.

[40]  Sagar Reddy Vumanthala,et al.  Nonlocal means estimation of intrinsic mode functions for speech enhancement , 2020, Turkish J. Electr. Eng. Comput. Sci..

[41]  Benoît Champagne,et al.  Incorporating the human hearing properties in the signal subspace approach for speech enhancement , 2003, IEEE Trans. Speech Audio Process..

[42]  Ram Bilas Pachori,et al.  Speech enhancement based on mEMD-VMD method , 2017 .

[43]  Abdel-Ouahab Boudraa,et al.  Speech Enhancement via EMD , 2008, EURASIP J. Adv. Signal Process..

[44]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[45]  Atul Kumar Dwivedi,et al.  Noise Reduction in ECG Signal Using Combined Ensemble Empirical Mode Decomposition Method with Stationary Wavelet Transform , 2020, Circuits Syst. Signal Process..

[46]  Yu Tsao,et al.  Complex spectrogram enhancement by convolutional neural network with multi-metrics learning , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[47]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.