Empirical Mode Decomposition for Advanced Speech Signal Processing

Empirical mode decomposition (EMD) is a newly developed tool to analyze nonlinear and non-stationary signals. It is used to decompose any signal into a finite number of time varying subband signals termed as intrinsic mode functions (IMFs). Such data adaptive decomposition is recently used in speech enhancement. This study presents the concept of EMD and its application to advanced speech signal processing paradigms including speech enhancement by soft-thresholding, voiced/unvoiced (V/Uv) speech discrimination and pitch estimation. The speech processing is frequently performed in the transformed domain and the transformation is usually achieved by traditional signal analysis techniques i.e. Fourier and wavelet transformations. These analysis methods employ priori basis function and it is not suitable for data adaptive analysis for non-stationary signal like speech. Recently, EMD is taken much attention for speech signal processing in data adaptive way. Several EMD based potential soft-thresholding algorithms for speech enhancement are discussed here. The V/Uv discrimination is an important concern in speech processing. It is usually performed by using acoustic features. The training data is used to determine the threshold for classification. The EMD based data adaptive thresholding approach is developed for V/Uv discrimination without any training phase. Noticeable improvement is achieved with the application of EMD in pitch estimation of noisy speech signals. The related experimental results are also presented to realize the effectiveness of EMD in advanced speech processing algorithms.

[1]  C. Diks Nonlinear time series analysis , 1999 .

[2]  Keikichi Hirose,et al.  Pitch estimation of noisy speech signals using empirical mode decomposition , 2007, INTERSPEECH.

[3]  Md. Khademul Islam Molla,et al.  Single Channel Speech Enhancement Using Adaptive Soft-Thresholding with Bivariate EMD , 2013 .

[4]  S. S. Shen,et al.  A confidence limit for the empirical mode decomposition and Hilbert spectral analysis , 2003, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[5]  Gabriel Rilling,et al.  Empirical mode decomposition as a filter bank , 2004, IEEE Signal Processing Letters.

[6]  N. Huang,et al.  The Mechanism for Frequency Downshift in Nonlinear Wave Evolution , 1996 .

[7]  Judith C. Brown,et al.  A high resolution fundamental frequency determination based on phase changes of the Fourier transform , 1993 .

[8]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[9]  Md. Kamrul Hasan,et al.  Soft thresholding for DCT speech enhancement , 2002 .

[10]  S. S. Shen,et al.  Applications of Hilbert–Huang transform to non‐stationary financial time series analysis , 2003 .

[11]  Gilles Burel,et al.  On exact Kalman filtering of polynomial systems , 2006, IEEE Transactions on Circuits and Systems I: Regular Papers.

[12]  N. Huang,et al.  A new view of nonlinear water waves: the Hilbert spectrum , 1999 .

[13]  Chia-Ping Chen,et al.  Noise-robust speech feature processing with empirical mode decomposition , 2011, EURASIP J. Audio Speech Music. Process..

[14]  Keikichi Hirose,et al.  Single-Mixture Audio Source Separation by Subspace Decomposition of Hilbert Spectrum , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  David G. Long Comments on Hilbert Transform Based Signal Analysis , 2004 .

[16]  Andrea Pigorini,et al.  Time–frequency spectral analysis of TMS-evoked EEG oscillations by means of Hilbert–Huang transform , 2011, Journal of Neuroscience Methods.

[17]  Mirko van der Baan,et al.  Empirical mode decomposition for seismic time-frequency analysis , 2013 .

[18]  Keikichi Hirose,et al.  Single-Channel Speech Enhancement by NWNS and EMD , 2011 .

[19]  N. Huang,et al.  A study of the characteristics of white noise using the empirical mode decomposition method , 2004, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[20]  Keikichi Hirose,et al.  Robust voiced/unvoiced classification of speech signals using Hilbert-Huang transformation (Special issue on nonlinear circuits and signal processing) , 2008 .

[21]  Keikichi Hirose,et al.  EMD based soft-thresholding for speech enhancement , 2007, INTERSPEECH.

[22]  Te-Ming Tu,et al.  Iris recognition with an improved empirical mode decomposition method , 2009 .

[23]  Keikichi Hirose,et al.  Multiband linear prediction of speech signals with adaptive order using empirical mode decomposition (Special issue on nonlinear circuits and signal processing) , 2007 .

[24]  Keikichi Hirose,et al.  Speech enhancement using soft thresholding with DCT-EMD based hybrid algorithm , 2007, 2007 15th European Signal Processing Conference.

[25]  Stephen A. Zahorian,et al.  Yet Another Algorithm for Pitch Tracking , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Keikichi Hirose,et al.  Harmonic modification and data adaptive filtering based approach to robust pitch estimation , 2011, Int. J. Speech Technol..

[27]  Andreas Spanias,et al.  Cepstrum-based pitch detection using a new statistical V/UV classification algorithm , 1999, IEEE Trans. Speech Audio Process..

[28]  P Usa,et al.  A technique to improve the empirical mode decomposition in the Hilbert-Huang transform , 2003 .

[29]  Hai Huang,et al.  Speech pitch determination based on Hilbert-Huang transform , 2006, Signal Process..

[30]  Keikichi Hirose,et al.  Adaptive thresholding approach for robust voiced/unvoiced classification , 2011, 2011 IEEE International Symposium of Circuits and Systems (ISCAS).

[31]  Gabriel Rilling,et al.  Detrending and denoising with empirical mode decompositions , 2004, 2004 12th European Signal Processing Conference.

[32]  Keikichi Hirose,et al.  Separation of Mixed Audio Signals by Decomposing Hilbert Spectrum with Modified EMD , 2006, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[33]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[34]  J. I. Salisbury,et al.  Using modern time series analysis techniques to predict ENSO events from the SOI time series , 2002 .

[35]  Keikichi Hirose,et al.  Speech Enhancement Using EMD Based Adaptive Soft-Thresholding (EMD-ADT) , 2012 .

[36]  J. Rouat,et al.  Wavelet speech enhancement based on the Teager energy operator , 2001, IEEE Signal Processing Letters.

[37]  Gabriel Rilling,et al.  Bivariate Empirical Mode Decomposition , 2007, IEEE Signal Processing Letters.

[38]  D. P. Mandic,et al.  Multivariate empirical mode decomposition , 2010, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[39]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals: Algorithms and Devices , 1983 .

[40]  A. Karagiannis,et al.  Noise components identification in biomedical signals based on Empirical Mode Decomposition , 2009, 2009 9th International Conference on Information Technology and Applications in Biomedicine.

[41]  Muhammad Altaf,et al.  Rotation Invariant Complex Empirical Mode Decomposition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[42]  Hajime Kobayashi,et al.  Weighted autocorrelation for pitch extraction of noisy speech , 2001, IEEE Trans. Speech Audio Process..

[43]  Larry A Shepp,et al.  Why the variance , 1998 .

[44]  Marcus Dätig,et al.  Performance and limitations of the Hilbert–Huang transformation (HHT) with an application to irregular water waves , 2004 .

[45]  Howell Tong,et al.  Nonlinear Time Series Analysis , 2005, International Encyclopedia of Statistical Science.

[46]  Toshihisa Tanaka,et al.  Complex Empirical Mode Decomposition , 2007, IEEE Signal Processing Letters.

[47]  Patrick Flandrin,et al.  Time-Frequency/Time-Scale Analysis , 1998 .

[48]  N. A. Kader Pitch detection algorithm using a wavelet correlation model , 2000, Proceedings of the Seventeenth National Radio Science Conference. 17th NRSC'2000 (IEEE Cat. No.00EX396).

[49]  Danilo P. Mandic,et al.  Filter Bank Property of Multivariate Empirical Mode Decomposition , 2011, IEEE Transactions on Signal Processing.

[50]  Lihua Yang,et al.  A Novel Pitch Period Detection Algorithm Based on Hilbert-Huang Transform , 2004, SINOBIOMETRICS.

[51]  Md. Kamrul Hasan,et al.  Signal reshaping using dominant harmonic for pitch estimation of noisy speech , 2006, Signal Process..