Characterizing glottal activity from speech using Empirical Mode Decomposition

Glottal activity is an important aspect of speech production that results in voiced speech, and localizing such regions for computing various parameters of the excitation source is useful in many speech processing applications. The aim of this paper is to investigate the ability of Empirical Mode Decomposition (EMD) and its noise assisted variants, in characterizing glottal activity from the speech signal. A pair of consecutive Intrinsic Mode Functions (IMFs), obtained from the decomposition is found to reflect the periodic nature of different voiced regions of the speech signal. This IMF pair is utilized to construct a signal, named the Glottal Intrinsic Mode Function (GIMF), which represents most of the voiced speech regions. To measure the capability of the GIMF in representing the glottal activity, it is applied to the tasks of Glottal Activity Detection (GAD), pitch frequency (F0) tracking and detecting pitch markers. The results ascertain the capability of EMD in localizing Glottal activity within a small subset of IMFs, and suggest the possibility of accurately extracting source-information from voiced speech with simple signal processing procedures.

[1]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[2]  K. Hirose,et al.  Voiced/Unvoiced Detection of Speech Signals Using Empirical Mode Decomposition Model , 2007, 2007 International Conference on Information and Communication Technology.

[3]  Norden E. Huang,et al.  Ensemble Empirical Mode Decomposition: a Noise-Assisted Data Analysis Method , 2009, Adv. Data Sci. Adapt. Anal..

[4]  Thomas F. Quatieri,et al.  Pitch estimation and voicing detection based on a sinusoidal speech model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Bayya Yegnanarayana,et al.  Characterization of Glottal Activity From Speech Signals , 2009, IEEE Signal Processing Letters.

[6]  Hai Huang,et al.  Speech pitch determination based on Hilbert-Huang transform , 2006, Signal Process..

[7]  Hugo Leonardo Rufiner,et al.  A new algorithm for instantaneous F0 speech extraction based on Ensemble Empirical Mode Decomposition , 2009, 2009 17th European Signal Processing Conference.

[8]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[9]  Priyanka Galhotra,et al.  Determine the Pitch Markers in Speech Signal Using Ensemble Empirical Mode Decomposition , 2012 .

[10]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[11]  Patrick Flandrin,et al.  A complete ensemble empirical mode decomposition with adaptive noise , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Gabriel Rilling,et al.  Empirical mode decomposition as a filter bank , 2004, IEEE Signal Processing Letters.

[13]  María Eugenia Torres,et al.  Improved complete ensemble EMD: A suitable tool for biomedical signal processing , 2014, Biomed. Signal Process. Control..

[14]  N. Huang,et al.  A study of the characteristics of white noise using the empirical mode decomposition method , 2004, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[15]  Bayya Yegnanarayana,et al.  Voiced/Nonvoiced Detection Based on Robustness of Voiced Epochs , 2010, IEEE Signal Processing Letters.

[16]  N. Ellouze,et al.  Empirical mode decomposition of voiced speech signal , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[17]  Norden E. Huang,et al.  Complementary Ensemble Empirical Mode Decomposition: a Novel Noise Enhanced Data Analysis Method , 2010, Adv. Data Sci. Adapt. Anal..