DSCC features for Hindi vowel classification

MFCC is a popular feature extraction technique for speech recognition applications. Various researchers have worked with variations of MFCC features. Delta Spectral Cepstral Coefficient (DSCC) is a new feature extraction technique for speech recognition. It is similar in its approach to MFCC features and gives much better recognition accuracy compared to MFCC features in noisy environments and dynamically changing environments. Hence, DSCC features are more suitable for real life speech recognition application. Here we have worked with DSCC and MFCC features for Hindi vowel classification task. Hidden Markov Model is used as the classifier. It has been observed that DSCC features improved the classification efficiency by 6.16% and 5.186% for car noise at −5 dB and 10 dB respectively.

[1]  P. V. S. Rao,et al.  Hindi speech database , 2000, INTERSPEECH.

[2]  Mahesh Chandra,et al.  Admissible wavelet packet features based on human inner ear frequency response for Hindi consonant recognition , 2014, Comput. Electr. Eng..

[3]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[4]  Richard M. Stern,et al.  Delta-spectral cepstral coefficients for robust speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Richard M. Stern,et al.  Maximum-likelihood-based cepstral inverse filtering for blind speech dereverberation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Astik Biswas,et al.  Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition , 2014, Int. J. Speech Technol..

[7]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[8]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[9]  Richard M. Stern,et al.  Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.