Classification of phonemes using modulation spectrogram based features for Gujarati language

In this paper, features extracted from modulation spectrogram are used to classify the phonemes in Gujarati language. Modulation spectrogram which is a 2-dimensional (i.e., 2-D) feature vector, is then reduced to a smaller feature dimension by using the proposed feature extraction method. Gujarati database was manually segmented in 31 phoneme classes. These phonemes are then classified using support vector machine (SVM) classifier. Classification accuracy of phoneme classification is 94.5 % as opposed to classification with the state-of-the-art feature set Mel frequency cepstral coefficients (MFCC), which yields 92.74 % classification accuracy. Classification accuracy for broad phoneme classes, viz., vowel, stops, nasals, semivowels, affricates and fricatives is also determined. Phoneme classification in their respective classes is 95.03 % correct with the proposed feature set. Fusion of MFCC with the proposed feature set is performing even better, giving phoneme classification accuracy of 95.7%. With the fusion of features phoneme classification in sonorant and obstruent classes is found to be 97.01 % accurate.

[1]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[2]  Victor Zue,et al.  Selecting acoustic features for stop consonant identification , 1983, ICASSP.

[3]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[4]  Alejandro Murua,et al.  Classification and clustering of stop consonants via nonparametric transformations and wavelets , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Steven Greenberg,et al.  The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  H. Hermansky,et al.  The modulation spectrum in the automatic recognition of speech , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[7]  Jan Van der Spiegel,et al.  An acoustic-phonetic feature-based system for the automatic recognition of fricative consonants , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  Jan Van der Spiegel,et al.  Robust classification of stop consonants using auditory-based speech processing , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  Les E. Atlas,et al.  EURASIP Journal on Applied Signal Processing 2003:7, 668–675 c ○ 2003 Hindawi Publishing Corporation Joint Acoustic and Modulation Frequency , 2003 .

[10]  Mark Hasegawa-Johnson,et al.  Stop consonant classification by dynamic formant trajectory , 2004, INTERSPEECH.

[11]  Yannis Stylianou,et al.  Voice Pathology Detection and Discrimination Based on Modulation Spectral Features , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Hemant A. Patil,et al.  Classification of Fricatives Using Novel Modulation Spectrogram Based Features , 2013, PReMI.

[13]  Hemant A. Patil,et al.  Obstruent classification using modulation spectrogram based features , 2014, 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA).