Urdu Speech Corpus and Preliminary Results on Speech Recognition

Language resources for Urdu language are not well developed. In this work, we summarize our work on the development of Urdu speech corpus for isolated words. The Corpus comprises of 250 isolated words of Urdu recorded by ten individuals. The speakers include both native and non-native, male and female individuals. The corpus can be used for both speech and speaker recognition tasks. We also report our results on automatic speech recognition task for the said corpus. The framework extracts Mel Frequency Cepstral Coefficients along with the velocity and acceleration coefficients, which are then fed to different classifiers to perform recognition task. The classifiers used are Support Vector Machines, Random Forest and Linear Discriminant Analysis. Experimental results show that the best results are provided by the Support Vector Machines with a test set accuracy of 73 %. The results reported in this work may provide a useful baseline for future research on automatic speech recognition of Urdu.

[1]  Oliver Chiu-sing Choy,et al.  An efficient MFCC extraction method in speech recognition , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[2]  Damjan Vlaj,et al.  Efficient Noise Robust Feature Extraction Algorithms for Distributed Speech Recognition (DSR) Systems , 2003, Int. J. Speech Technol..

[3]  M.S. Awan,et al.  Recognizing spoken Urdu numbers using fourier descriptor and neural networks with Matlab , 2008, 2008 Second International Conference on Electrical Engineering.

[4]  Naveed Sarfraz Khattak,et al.  Speaker Independent Urdu speech recognition using HMM , 2010, 2010 The 7th International Conference on Informatics and Systems (INFOS).

[5]  T. Mehmood,et al.  Speech recognition using multilayer perceptron , 2002, IEEE Students Conference, ISCON '02. Proceedings..

[6]  T. Isobe,et al.  Telephone speech data corpus and performances of speaker independent recognition system using the corpus , 1994, Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications.

[7]  A. Ganapathiraju,et al.  LINEAR DISCRIMINANT ANALYSIS - A BRIEF TUTORIAL , 1995 .

[8]  Xianwei Zhou,et al.  DWT features performance analysis for automatic speech recognition of Urdu , 2014, SpringerPlus.

[9]  Agha Ali Raza,et al.  Speech Corpus Development for a Speaker Independent Spontaneous Urdu Speech Recognition System , 2010 .

[10]  Isabelle Guyon,et al.  Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[11]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[12]  Xianwei Zhou,et al.  Automatic speech recognition of Urdu words using linear discriminant analysis , 2015, J. Intell. Fuzzy Syst..

[13]  Khalid Iqbal,et al.  Automatic Speech Recognition of Urdu Digits with Optimal Classification Approach , 2015 .

[14]  Artur S. d'Avila Garcez,et al.  Unimodal late fusion for NIST i-vector challenge on speaker detection , 2014 .

[15]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[16]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[17]  Xianwei Zhou,et al.  Comparison of MFCC and DWT features for automatic speech recognition of Urdu , 2013 .

[18]  Agha Ali Raza,et al.  Design and development of phonetically rich Urdu speech corpus , 2009, 2009 Oriental COCOSDA International Conference on Speech Database and Assessments.

[19]  Hermann Ney,et al.  Computing Mel-frequency cepstral coefficients on the power spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[20]  M. Arif,et al.  Design of an Urdu speech recognizer based upon acoustic phonetic modeling approach , 2004, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..