Speaker accent recognition through statistical descriptors of Mel-bands spectral energy and neural network model

Accent recognition is one of the most important topics in automatic speaker and speaker-independent speech recognition (SI-ASR) systems in recent years. The growth of voice-controlled technologies has becoming part of our daily life, nevertheless variability in speech makes these spoken language technologies relatively difficult. One of the profound variability is accent. By classifying accent types, different models could be developed to handle SI-ASR. In this paper, we classified three accents in English language recorded from three main ethnicities in Malaysia namely Malay, Chinese and Indian using artificial neural network model. All experiments were performed in speaker-independent and three most accent-sensitive words-independent modes. Mel-bands spectral energy was extracted from eighteen bands taking the statistical values of each speech sample i.e. mean, standard deviation, kurtosis and the ratio of standard deviation to kurtosis to characterize the spectral energy distribution. The system was evaluated using independent test dataset, partial-independent test dataset and training dataset. The best three-class accuracy rate of 99.01% with independent test dataset was obtained. The overall accuracy rate for several trials was averaged to 96.79% with the average learning time at 49 epochs.

[1]  John H. L. Hansen,et al.  Language accent classification in American English , 1996, Speech Commun..

[2]  D. Fohr,et al.  Text-Independent Foreign Accent Classification using Statistical Methods , 2007, 2007 IEEE International Conference on Signal Processing and Communications.

[3]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[4]  Sazali Yaacob,et al.  Malaysian English accents identification using LPC and formant analysis , 2011, 2011 IEEE International Conference on Control System, Computing and Engineering.

[5]  M Paulraj Introduction to Artificial Neural Networks , 2003 .

[6]  Thomas Fang Zheng,et al.  Multi-layered features with SVM for Chinese accent identification , 2010, 2010 International Conference on Audio, Language and Image Processing.

[7]  Saeed Setayeshi,et al.  Persian Accents Identification Using an Adaptive Neural Network , 2010, 2010 Second International Workshop on Education Technology and Computer Science.

[8]  Shanta Nair-Venugopal,et al.  English, identity and the Malaysian workplace , 2000 .

[9]  Joachim Diederich,et al.  Accent in Speech Samples: Support Vector Machines for Classification and Rule Extraction , 2008, Rule Extraction from Support Vector Machines.

[10]  Philip C. Woodland,et al.  Using accent-specific pronunciation modelling for robust speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Dat Tran,et al.  Australian Accent-Based Speaker Classification , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[12]  Joachim Diederich,et al.  Rule Extraction from Support Vector Machines: An Introduction , 2008, Rule Extraction from Support Vector Machines.

[13]  Sadaoki Furui,et al.  Fifty years of progress in speech and speaker recognition , 2004 .

[14]  M.P. Paulraj,et al.  Phoneme-based or isolated-word modeling speech recognition system? An overview , 2011, 2011 IEEE 7th International Colloquium on Signal Processing and its Applications.

[15]  Isabel Trancoso,et al.  Accent identification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16]  Venu Govindaraju,et al.  Accent classification in speech , 2005, Fourth IEEE Workshop on Automatic Identification Advanced Technologies (AutoID'05).