Multiclass SVM based spoken hindi numerals recognition

This paper presents recognition of isolated Hindi numerals using multiclass Support Vector Machine (SVM). The acoustic features in terms of Linear Predictive Coding (LPC), Mel+Frequency Cepstral Coefficients (MFC C) and combination of LPC and MFCC have been considered as inputs to the recognition process. The extracted acoustic features are given as input to the SVM. The classification is performed in two steps. In first step, a one+versus+all SVM cl assifier is used to identify the Hindi language. Further, in second step ten one+versus+all classifiers are used to recognize numer als. The linear, polynomial and RBF kernels are used for the construction of SVM for recognition purpose. In the first phase, the best kernel strategy was explored for a fixed number of frames of the speech signal. The highest recognition rate has been achieved using linear kernel strategy. Next, the number of frames in order to calculate LPCs and MFCCs was varied and recognition accuracy was calculated. The highest recognition accuracy achieved in this study is 96.8%.

[1]  Joon-Hyuk Chang,et al.  Voice activity detection based on statistical models and machine learning approaches , 2010, Comput. Speech Lang..

[2]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[3]  B. Yegnanarayana,et al.  Acoustic model combination for recognition of speech in multiple languages using support vector machines , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[4]  Anup Kumar Paul,et al.  Bangla Speech Recognition System Using LPC and ANN , 2009, 2009 Seventh International Conference on Advances in Pattern Recognition.

[5]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[6]  Ashish Verma,et al.  A large-vocabulary continuous speech recognition system for Hindi , 2004, IBM J. Res. Dev..

[7]  Srinivasan Umesh,et al.  VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Xi Xiao,et al.  A hybrid SVM/DDBHMM decision fusion modeling for robust continuous digital speech recognition , 2007, Pattern Recognit. Lett..

[9]  Bayya Yegnanarayana,et al.  A constraint satisfaction model for recognition of stop consonant-vowel (SCV) utterances , 2002, IEEE Trans. Speech Audio Process..

[10]  Juan Manuel Górriz,et al.  SVM-based speech endpoint detection using contextual speech features , 2006 .

[11]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[12]  David Burshtein,et al.  Support Vector Machine Training for Improved Hidden Markov Modeling , 2008, IEEE Transactions on Signal Processing.

[13]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[14]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[15]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[16]  Dae Won Kim,et al.  Near-Boundary Data Selection for Fast Suppor Vector Machines , 2013 .

[17]  Carmen Peláez-Moreno,et al.  Real-Time Robust Automatic Speech Recognition Using Compact Support Vector Machines , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Mayank Dave,et al.  Application of genetically optimized neural networks for hindi speech recognition system , 2011, 2011 World Congress on Information and Communication Technologies.

[19]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[20]  Daryl H. Graf,et al.  An introduction to speech and speaker recognition , 1990, Computer.

[21]  Akanksha Saxena Hindi Speech Recognition , 2015 .

[22]  Ioannis Pitas,et al.  Application of support vector machines classifiers to visual speech recognition , 2002, Proceedings. International Conference on Image Processing.

[23]  P Le Cerf,et al.  A new variable frame analysis method for speech recognition , 1994 .

[24]  Zheng-Hua Tan,et al.  Low-Complexity Variable Frame Rate Analysis for Speech Recognition and Voice Activity Detection , 2010, IEEE Journal of Selected Topics in Signal Processing.

[25]  Fayçal Ykhlef,et al.  Comparative performance study of several features for voiced/ non-voiced classification , 2014, Int. Arab J. Inf. Technol..

[26]  Joseph Picone,et al.  Applications of support vector machines to speech recognition , 2004, IEEE Transactions on Signal Processing.

[27]  Er Meng Joo,et al.  Improved linear predictive coding method for speech recognition , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[28]  Carmen Peláez-Moreno,et al.  Robust ASR using Support Vector Machines , 2007, Speech Commun..

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[31]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[32]  Cemal Ardil,et al.  Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems , 2007 .

[33]  B. Venkataramani,et al.  Evaluation of multiclass support vector machine classifiers using optimum threshold-based pruning technique , 2011 .

[34]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.