A better decomposition of speech obtained using modified Empirical Mode Decomposition

Abstract The objective of this work is to obtain meaningful time domain components , or Intrinsic Mode Functions (IMFs), of the speech signal, using Empirical Mode Decomposition (EMD), with reduced mode mixing , and in a time-efficient manner. This work focuses on two aspects – firstly, extracting IMFs of the speech signal which can better reflect its higher frequency spectrum; and secondly, to get a better representation and distribution of the vocal tract resonances of the speech signal in its IMFs, compared to that obtained from standard EMD. To this effect, modifications are proposed to the EMD algorithm for processing speech signals, based on the critical nature of the interpolation points (IPs) used for cubic spline interpolation in EMD. The effect of using different sets of IPs, other than the extrema of the residue – as used in standard EMD – is analyzed. It is found that having more IPs is beneficial only upto a certain limit, after which the characteristic dyadic filterbank nature of EMD breaks down. For certain sets of IPs, these modified EMD processes perform better than EMD, giving better frequency separability between the IMFs, and an enhanced representation of the higher frequency content of the signal. A detailed study of the distribution of the formants , in the IMFs of the speech signal, is done using Linear Prediction (LP) analysis of the IMFs. It is found that the IMFs of the EMD variants have a far better distribution of the formants structure within them, with reduced overlapping amongst their filter spectrums, compared to that of standard EMD. Henceforth, when subjected to the task of formants estimation of voiced speech, using LP analysis, the IMFs of the modified EMD processes cumulatively exhibit a superior performance than that of standard EMD, or the speech signal itself, under both clean and noisy conditions.

[1]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[2]  B. Yegnanarayana Formant extraction from linear‐prediction phase spectra , 1978 .

[3]  Jianwu Dang,et al.  Physiological Feature Extraction for Text Independent Speaker Identification using Non-Uniform Subband Processing , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Raghunath S. Holambe,et al.  Advances in Non-Linear Modeling for Speech Processing , 2012, Springer Briefs in Electrical and Computer Engineering.

[5]  Jianwu Dang,et al.  An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification , 2008, Speech Commun..

[6]  K Honda,et al.  Acoustic characteristics of the piriform fossa in models and humans. , 1997, The Journal of the Acoustical Society of America.

[7]  N. Ellouze,et al.  Empirical mode decomposition of voiced speech signal , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[8]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[9]  N. Huang,et al.  A study of the characteristics of white noise using the empirical mode decomposition method , 2004, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[10]  Norden E. Huang,et al.  Complementary Ensemble Empirical Mode Decomposition: a Novel Noise Enhanced Data Analysis Method , 2010, Adv. Data Sci. Adapt. Anal..

[11]  Tarun Kumar Rawat,et al.  Optimal design of FIR fractional order differentiator using cuckoo search algorithm , 2015, Expert Syst. Appl..

[12]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.

[13]  Rajib Sharma,et al.  Characterizing glottal activity from speech using Empirical Mode Decomposition , 2015, 2015 Twenty First National Conference on Communications (NCC).

[14]  Paulo Gonçalves,et al.  Empirical Mode Decompositions as Data-Driven Wavelet-like Expansions , 2004, Int. J. Wavelets Multiresolution Inf. Process..

[15]  María Eugenia Torres,et al.  Improved complete ensemble EMD: A suitable tool for biomedical signal processing , 2014, Biomed. Signal Process. Control..

[16]  Fumitada Itakura,et al.  Text-dependent speaker recognition using the information in the higher frequency band , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[18]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[19]  Patrick Flandrin,et al.  A complete ensemble empirical mode decomposition with adaptive noise , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Tarun Kumar Rawat,et al.  Optimal fractional delay-IIR filter design using cuckoo search algorithm. , 2015, ISA transactions.

[21]  Ronald W. Schafer,et al.  Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[22]  Ahmed Ben Hamida,et al.  A comparative study of formant frequencies estimation techniques , 2006 .

[23]  Gang Wang,et al.  On Intrinsic Mode Function , 2010, Adv. Data Sci. Adapt. Anal..

[24]  Gabriel Rilling,et al.  One or Two Frequencies? The Empirical Mode Decomposition Answers , 2008, IEEE Transactions on Signal Processing.

[25]  Chen Xiangxian Speech formant frequency estimation based on Hilbert-Huang transform , 2006 .

[26]  Kiyoshi Honda,et al.  Individual variation of the hypopharyngeal cavities and its acoustic effects , 2005 .

[27]  Joerg F. Hipp,et al.  Time-Frequency Analysis , 2014, Encyclopedia of Computational Neuroscience.

[28]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[29]  Hai Huang,et al.  Speech pitch determination based on Hilbert-Huang transform , 2006, Signal Process..

[30]  Petros Maragos,et al.  On separating amplitude from frequency modulations using energy operators , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[32]  Petros Maragos,et al.  Nonlinear methods for speech analysis and synthesis , 2008 .

[33]  Gabriel Rilling,et al.  Empirical mode decomposition as a filter bank , 2004, IEEE Signal Processing Letters.

[34]  Douglas A. Reynolds,et al.  Measuring fine structure in speech: application to speaker identification , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[35]  Abeer Alwan,et al.  A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[36]  H. M. Teager,et al.  Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract , 1990 .

[37]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[38]  Jian-Da Wu,et al.  Speaker identification system using empirical mode decomposition and an artificial neural network , 2011, Expert Syst. Appl..

[39]  B. Yegnanarayana,et al.  Epoch extraction from linear prediction residual for identification of closed glottis interval , 1979 .

[40]  Bayya Yegnanarayana,et al.  Spectro-temporal analysis of speech signals using zero-time windowing and group delay function , 2013, Speech Commun..

[41]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[42]  James F. Kaiser,et al.  The use of a masking signal to improve empirical mode decomposition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[43]  Steve McLaughlin,et al.  Improved EMD Using Doubly-Iterative Sifting and High Order Spline Interpolation , 2008, EURASIP J. Adv. Signal Process..

[44]  Petros Maragos,et al.  AM-FM energy detection and separation in noise using multiband energy operators , 1993, IEEE Trans. Signal Process..

[45]  Gabriel Rilling,et al.  on the Influence of Sampling on the Empirical Mode Decomposition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[46]  P. Maragos,et al.  Speech formant frequency and bandwidth tracking using multiband energy demodulation , 1996 .

[47]  Norden E. Huang,et al.  Ensemble Empirical Mode Decomposition: a Noise-Assisted Data Analysis Method , 2009, Adv. Data Sci. Adapt. Anal..

[48]  John J. Soraghan,et al.  EMD-Based Filtering (EMDF) of Low-Frequency Noise for Speech Enhancement , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[49]  P Usa,et al.  A technique to improve the empirical mode decomposition in the Hilbert-Huang transform , 2003 .