论文信息 - PERFORMANCE OF DIFFERENT CLASSIFIERS IN SPEECH RECOGNITION

PERFORMANCE OF DIFFERENT CLASSIFIERS IN SPEECH RECOGNITION

Speech is the most natural means of communication among human beings and speech processing and recognition are intensive areas of research for the last five decades. Since speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. In this work, a speech recognition system is developed for recognizing speaker independent spoken digits in Malayalam. Voice signals are sampled directly from the microphone. The proposed method is implemented for 1000 speakers uttering 10 digits each. Since the speech signals are affected by background noise, the signals are tuned by removing the noise from it using wavelet denoising method based on Soft Thresholding. Here, the features from the signals are extracted using Discrete Wavelet Transforms (DWT) because they are well suitable for processing non-stationary signals like speech. This is due to their multiresolutional, multi-scale analysis characteristics. Speech recognition is a multiclass classification problem. So, the feature vector set obtained are classified using three classifiers namely, Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Naive Bayes classifiers which are capable of handling multiclasses. During classification stage, the input feature vector data is trained using information relating to known patterns and then they are tested using the test data set. The performances of all these classifiers are evaluated based on recognition accuracy. All the three methods produced good recognition accuracy. DWT and ANN produced a recognition accuracy of 89%, SVM and DWT combination produced an accuracy of 86.6% and Naive Bayes and DWT combination produced an accuracy of 83.5%. ANN is found to be better among the three methods.

[1] S. Kadambe,et al. Application of adaptive wavelets for speech coding , 1994, Proceedings of IEEE-SP International Symposium on Time- Frequency and Time-Scale Analysis.

[2] Nello Cristianini,et al. An introduction to Support Vector Machines , 2000 .

[3] Odette Scharenborg,et al. Reaching over the gap: A review of efforts to link human and automatic speech recognition research , 2007, Speech Commun..

[4] G.. A Theory for Multiresolution Signal Decomposition : The Wavelet Representation , 2004 .

[5] Chih-Jen Lin,et al. A Comparison of Methods for Multi-class Support Vector Machines , 2015 .

[6] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[7] Anne Katz Rn,et al. A New Perspective , 2003 .

[8] Li Dan,et al. Research of Text Categorization on WEKA , 2013, 2013 Third International Conference on Intelligent System Design and Engineering Applications.

[9] Simon King,et al. Speech and Audio Signal Processing , 2011 .

[10] Chee Peng Lim,et al. Development of a speaker recognition system using wavelets and artificial neural networks , 2001, Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No.01EX489).

[11] David L. Donoho,et al. De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[12] Jiawei Han,et al. Data Mining: Concepts and Techniques , 2000 .

[13] Elif Derya Übeyli. Combined neural network model employing wavelet coefficients for EEG signals classification , 2009, Digit. Signal Process..

[14] Dimitrios K. Lymberopoulos,et al. A new perspective in learning pattern generation for teaching neural networks , 1999, Neural Networks.

[15] Yousef Ajami Alotaibi. Investigating spoken Arabic digits in speech recognition setting , 2005, Inf. Sci..

[16] Yasser Ghanbari,et al. A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets , 2006, Speech Commun..

[17] Ching-Chung Li,et al. Identification of Speech Transients Using Variable Frame Rate Analysis and Wavelet Packets , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[18] Cini Kurian,et al. Speech recognition of Malayalam numbers , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[19] Ulrich H.-G. Kreßel,et al. Pairwise classification and support vector machines , 1999 .

[20] Kenneth Thomas Schutte,et al. Parts-based models and local features for automatic speech recognition , 2009 .

[21] Stéphane Mallat,et al. A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[22] János Csirik,et al. On naive Bayes in speech recognition , 2005 .

[23] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.