Independent-speaker isolated word speech recognition based on mean-shift framing using hybrid HMM/SVM classifier

This paper studies an independent-speaker isolated word speech recognition based on mean-shift framing using hybrid HMM/SVM classifier. The proposed framework includes two main units: preprocessing unit, and classification unit. The first unit tries to segment the speech signal into proper frames using the benefits of mean-shift gradient clustering algorithm and extract time-frequency relevant features in a way that maximize relative entropy of time-frequency energy distribution among segments. Then the second unit classifies words into the proper classes. To fulfill this intention, self-adaptive HMM calculates word's likelihood of each existed class and finally support vector machine (SVM) classifies it by using all classes' likelihood as an input vector. To validate method's accuracy and stability, the method verified within TULIPS1 dataset in the present of different kind of additive noises provided by SPIB. Comparing the results with the outcomes of the previous paper shows 3.2% improvement.

[1]  Kambiz Rahbar,et al.  Discrete Word Speech Recognition Using Hybrid Self-adaptive HMM/SVM Classifier , 2007 .

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Ronald R. Coifman,et al.  On local orthonormal bases for classification and regression , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Kim-Fung Man,et al.  A genetic classification error method for speech recognition , 2002, Signal Process..

[7]  Cheol Hoon Park,et al.  Robust Audio-Visual Speech Recognition Based on Late Integration , 2008, IEEE Transactions on Multimedia.

[8]  Serkan Günal,et al.  Use of Novel Feature Extraction Technique with Subspace Classifiers for Speech Recognition , 2007, IEEE International Conference on Pervasive Services.

[9]  John H. L. Hansen,et al.  A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition , 2008, Speech Commun..

[10]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[11]  S.R. Quchani,et al.  Local Orthogonal Discriminate Bases to Hybrid SVM/Self-adaptive HMM Classifier for Discrete Word Speech Recognition , 2006, 2006 IEEE International Symposium on Signal Processing and Information Technology.

[12]  Jie Li,et al.  Self-adaptive design of hidden Markov models , 2004, Pattern Recognit. Lett..

[13]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[14]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[15]  Chin-Hui Lee,et al.  A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition , 2009, Speech Commun..

[16]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[17]  S. Mallat A wavelet tour of signal processing , 1998 .