Novel Temporal and Spectral Features Derived from TEO for Classification Normal and Dysphonic Voices

In this paper, various temporal features (i.e., zero crossing rate and short-time energy) and spectral features (spectral flux and spectral centroid) have been derived from the Teager energy operator (TEO) profile of the speech waveform. The efficacy of these features has been analyzed for the classification of normal and dysphonic voices by comparing their performance with the features derived from the linear prediction (LP) residual and the speech waveform. In addition, the effectiveness of fusing these features with state-of-the-art Mel frequency cepstral coefficients (MFCC) feature-set has also been investigated to understand whether these features provide complementary results. The classifier that has been used is the 2nd order polynomial classifier, with experiments being carried out on a subset of the Massachusetts Eye and Ear Infirmary (MEEI) database.

[1]  Yannis Stylianou,et al.  Dysphonia detection based on modulation spectral features and cepstral coefficients , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  William M. Campbell,et al.  Speaker recognition with polynomial classifiers , 2002, IEEE Trans. Speech Audio Process..

[3]  H. M. Teager,et al.  Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract , 1990 .

[4]  Danoush Hosseinzadeh,et al.  Combining Vocal Source and MFCC Features for Enhanced Speaker Recognition Performance Using GMMs , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[5]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[6]  S. B. Davis Acoustic Characteristics of Normal and Pathological Voices , 1979 .

[7]  Kuldip K. Paliwal,et al.  Spectral subband centroid features for speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  A. Marchal,et al.  Speech production and speech modelling , 1990 .

[9]  D. Jamieson,et al.  Identification of pathological voices using glottal noise measures. , 2000, Journal of speech, language, and hearing research : JSLHR.

[10]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[11]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[12]  Nicholas B. Allen,et al.  Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.