Obstruent classification using modulation spectrogram based features

In this paper, a new feature extraction technique based on modulation spectrogram is proposed. Modulation spectrogram gives a 2-dimensional (2-D) feature set for each obstruent segment. Since the size of feature vector given by modulation spectrogram is of very large dimension, Higher Order Singular Value Decomposition (HOSVD) theorem is used to reduce the size of feature vector. The reduced feature vector is then applied to a classifier, which classify the obstruent in three broad classes, viz., stop, affricate and fricative. Four-fold cross-validation experiments have been conducted on TIMIT database to find accuracy of obstruent classification at phoneme-level and recognition of manner of articulation of obstruents. Our experimental results show 92.22 % and 94.85 % accuracies for obstruent classification at phoneme-level and recognition of manner of articulation of obstruents, respectively, using 3-nearest neighbor classifier while with same experimental setup Mel Frequency Cepstral Coefficients (MFCC) shows 87.24 % and 93.68 % average classification accuracy of phoneme-level classification and manner of articulation level classification of obstruents, respectively.

[1]  Hynek Hermansky,et al.  Temporal envelope compensation for robust phoneme recognition using modulation spectrum. , 2010, The Journal of the Acoustical Society of America.

[2]  Jan Van der Spiegel,et al.  An acoustic-phonetic feature-based system for automatic phoneme recognition in continuous speech , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[3]  Louis J. Gerstman Noise Duration as a Cue for Distinguishing among Fricative, Affricate, and Stop Consonants , 1956 .

[4]  Yannis Stylianou,et al.  Voice Pathology Detection and Discrimination Based on Modulation Spectral Features , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Yannis Stylianou,et al.  Modulation spectral features for objective voice quality assessment , 2010, 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP).

[6]  Hynek Hermansky,et al.  Phoneme recognition using spectral envelope and modulation frequency features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  P Howell,et al.  Production and perception of rise time in the voiceless affricate/fricative distinction. , 1983, The Journal of the Acoustical Society of America.

[8]  Les E. Atlas,et al.  EURASIP Journal on Applied Signal Processing 2003:7, 668–675 c ○ 2003 Hindawi Publishing Corporation Joint Acoustic and Modulation Frequency , 2003 .

[9]  Jan Van der Spiegel,et al.  Robust classification of stop consonants using auditory-based speech processing , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Henning Reetz,et al.  Acoustic cues discriminating german obstruents in place and manner of articulation. , 2007, The Journal of the Acoustical Society of America.

[11]  Maria Markaki,et al.  Using modulation spectra for voice pathology detection and classification , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[12]  ACOUSTIC ANALYSIS OF THE PERSIAN FRICATIVE-AFFRICATE CONTRAST , 2007 .

[13]  Jan Van der Spiegel,et al.  Acoustic‐phonetic features for the automatic recognition of stop consonants , 1998 .

[14]  J D Miller,et al.  Plosive/fricative distinction: the voiceless case. , 1990, The Journal of the Acoustical Society of America.

[15]  Hynek Hermansky,et al.  Modulation frequency features for phoneme recognition in noisy speech. , 2009, The Journal of the Acoustical Society of America.

[16]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[17]  Hsiao-Chuan Wang,et al.  A Study of Knowledge-Based Features for Obstruent Detection and Classification in Continuous Mandarin Speech , 2006, ISCSLP.

[18]  Steven Greenberg,et al.  The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Jeung-Yoon Choi,et al.  Detection of obstruent consonant landmark for knowledge based speech recgonition system , 2008 .