A hybrid SVM/DDBHMM decision fusion modeling for robust continuous digital speech recognition

This paper proposes an improved hybrid support vector machine and duration distribution based hidden Markov (SVM/DDBHMM) decision fusion model for robust continuous digital speech recognition. We investigate the probability outputs combination of support vector machine and Gaussian mixture model in pattern recognition (called FSVM),and embed the fusion probability as similarity into the phone state level decision space of our duration distribution based hidden Markov model (DDBHMM) speech recognition system (named FSVM/DDBHMM). The performances of FSVM and FSVM/DDBHMM are demonstrated in Iris database and continuous mandarin digital speech corpus in 4 noise environments (white, volvo, babble and destroyerengine) from NOISEX-92. The experimental results show the effectiveness of FSVM in Iris data, and the improvement of average word error rate reduction of FSVM/DDBHMM from 6% to 20% compared with the DDBHMM baseline at various signal noise ratios (SNRs) from -5dB to 30dB by step of 5dB.

[1]  Joseph Picone,et al.  Hybrid SVM/HMM architectures for speech recognition , 2000, INTERSPEECH.

[2]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Joseph Picone,et al.  Support vector machines for speech recognition , 1998, ICSLP.

[4]  Pedro J. Moreno,et al.  On the use of support vector machines for phonetic classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[6]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  George Saon,et al.  Digit recognition in noisy environments via a sequential GMM/SVM system , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Gernot A. Fink,et al.  Combining acoustic and articulatory feature information for robust speech recognition , 2002, Speech Commun..

[10]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Steve Renals,et al.  Evaluation of kernel methods for speaker verification and identification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Boonserm Kijsirikul,et al.  Support Vector Machines for Thai Phoneme Recognition , 2001, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[14]  Misha Pavel,et al.  On the relative importance of various components of the modulation spectrum for automatic speech recognition , 1999, Speech Commun..

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Qingwei Zhao,et al.  A study of duration in continuous speech recognition based on DDBHMM , 1999, EUROSPEECH.

[17]  M. Agha,et al.  Finite Mixture Distribution , 1982 .

[18]  Brian Everitt,et al.  Cluster analysis , 1974 .

[19]  Shai Fine,et al.  A hybrid GMM/SVM approach to speaker identification , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[20]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[21]  Mark J. F. Gales,et al.  Using SVMS and discriminative models for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Ramesh A. Gopinath,et al.  Enhancing GMM scores using SVM "hints" , 2001, INTERSPEECH.

[23]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[24]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[25]  Steve J. Young,et al.  Statistical Modeling in Continuous Speech Recognition (CSR) , 2001, UAI.