Design of a real time automatic speech recognition system using Modified One Against All SVM classifier

Abstract In this paper, Texas Instruments TMS320C6713 DSP based real-time speech recognition system using Modified One Against All Support Vector Machine (SVM) classifier is proposed. The major contributions of this paper are: the study and evaluation of the performance of the classifier using three feature extraction techniques and proposal for minimizing the computation time for the classifier. From this study, it is found that the recognition accuracies of 93.33%, 98.67% and 96.67% are achieved for the classifier using Mel Frequency Cepstral Coefficients (MFCC) features, zerocrossing (ZC) and zerocrossing with peak amplitude (ZCPA) features respectively. To reduce the computation time required for the systems, two techniques – one using optimum threshold technique for the SVM classifier and another using linear assembly are proposed. The ZC based system requires the least computation time and the above techniques reduce the execution time by a factor of 6.56 and 5.95 respectively. For the purpose of comparison, the speech recognition system is also implemented using Altera Cyclone II FPGA with Nios II soft processor and custom instructions. Of the two approaches, the DSP approach requires 87.40% less number of clock cycles. Custom design of the recognition system on the FPGA without using the soft-core processor would have resulted in less computational complexity. The proposed classifier is also found to reduce the number of support vectors by a factor of 1.12–3.73 when applied to speaker identification and isolated letter recognition problems. The techniques proposed here can be adapted for various other SVM based pattern recognition systems.

[1]  B. Venkataramani,et al.  Design of a modified one-against-all SVM classifier , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[2]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[3]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[5]  Kaare Brandt Petersen,et al.  Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music , 2006, ISMIR.

[6]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[7]  Ali A. Ghorbani,et al.  Accent Classification Using Support Vector Machine and Hidden Markov Model , 2003, Canadian Conference on AI.

[8]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[9]  Christian Wellekens,et al.  On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Li Zhang,et al.  Wavelet support vector machine , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  B. Venkataramani,et al.  System on programmable chip implementation of neural network-based isolated digit recognition system , 2009 .

[12]  Wai C. Chu,et al.  Speech Coding Algorithms , 2003 .

[13]  Yanqiu Wang,et al.  A revised feather and down recognition model based on MOAA SVM , 2010, 2010 2nd IEEE International Conference on Information Management and Engineering.

[14]  James M. Kates,et al.  A time-domain digital cochlear model , 1991, IEEE Trans. Signal Process..

[15]  Kunal Jaiswal Prediction of Ubiquitin Proteins using Artificial Neural Networks, Hidden Markov Model and Support Vector Machines , 2007, Silico Biol..

[16]  Jing Bai,et al.  A noise-robust speech recognition system based on ZCPA features and support vector machine , 2009, 2009 ISECS International Colloquium on Computing, Communication, Control, and Management.

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[18]  Rhee Man Kil,et al.  Auditory processing of speech signals for robust speech recognition in real-world noisy environments , 1999, IEEE Trans. Speech Audio Process..

[19]  Shou-Qiang Kang,et al.  Arrhythmia Recognition Based on EMD and Support Vector Machines , 2010, 2010 4th International Conference on Bioinformatics and Biomedical Engineering.

[20]  Rulph Chassaing,et al.  Digital Signal Processing and Applications with the C6713 and C6416 DSK , 2004 .

[21]  Finnian Kelly,et al.  A comparison of auditory features for robust speech recognition , 2010, 2010 18th European Signal Processing Conference.

[22]  Andrew Heathcote,et al.  ChoiceKey: A real-time speech recognition program for psychology experiments with a small response set , 2009, Behavior research methods.

[23]  B. Venkataramani,et al.  FPGA Implementation of Support Vector Machine Based Isolated Digit Recognition System , 2009, 2009 22nd International Conference on VLSI Design.

[24]  Soo-Young Lee,et al.  A digital chip for robust speech recognition in noisy environment , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[25]  Carmen Peláez-Moreno,et al.  Multiclass SVM-Based Isolated-Digit Recognition using a HMM-Guided Segmentation , 2005 .

[26]  Mahesh M. Goyani,et al.  Performance Enhancement in Lip Synchronization Using MFCC Parameters , 2010 .