Accent Recognition System Using Deep Belief Networks for Telugu Speech Signals

Accent and Emotion recognition for speech has become most important research area because of the increased demand of speech processing systems in handheld devices. Most of the research in speech processing is done for the English language only. In this paper, we present accent recognition system for Telugu speeches. Three important accents of Telugu were chosen and text-dependent speeches of Coastal Andhra, Rayalaseema, and Telangana accents were collected. Features like tonal power ratio, spectral flux, pitch chroma, and MFCC were extracted from these speeches. deep belief networks are used for the classification purpose. The recognition accuracy obtained in this work is 93%.

[1]  Andreas Stolcke,et al.  Speaker Recognition With Session Variability Normalization Based on MLLR Adaptation Transforms , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Bo Xu,et al.  Mandarin accent adaptation based on context-independent/context-dependent pronunciation modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Alexander Lerch An introduction to audio content analysis , 2012 .

[4]  Mathew Magimai-Doss,et al.  Privacy-Sensitive Audio Features for Speech/Nonspeech Detection , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Chungyong Lee,et al.  An information-theoretic perspective on feature selection in speaker recognition , 2005, IEEE Signal Processing Letters.

[6]  Hagai Aronowitz,et al.  Efficient Speaker Recognition Using Approximated Cross Entropy (ACE) , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Bhaskar D. Rao,et al.  Robust Feature Extraction for Continuous Speech Recognition Using the MVDR Spectrum Estimation Method , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Kasiprasad Mannepalli,et al.  MFCC-GMM based accent recognition system for Telugu speech signals , 2015, International Journal of Speech Technology.

[9]  Khalid Saeed,et al.  A Speech-and-Speaker Identification System: Feature Extraction, Description, and Classification of Speech-Signal Image , 2007, IEEE Transactions on Industrial Electronics.