SVM classifiers for ASR: A discussion about parameterization

Automatic Speech Recognition (ASR) is essentially a problem of pattern classification, however, the time dimension of the speech signal has prevented to pose ASR as a simple static classification problem. Support Vector Machine (SVM) classifiers could provide an appropriate solution, since they are very well adapted to high-dimension classification problems. Nevertheless, the use of SVMs for ASR is by no means straightforward, because SVM classifiers require a fixed-dimension input. In this paper we propose and compare three alternatives for adapting the parameterization to the fixed-input dimension required by SVMs. We show that SVM classifiers outperforms the conventional HMM-based ASR system, when the speech signal is parameterised at properly selected instants.

[1]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[2]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[3]  Alex Waibel,et al.  Continuous speech recognition using linked predictive neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Thomas G. Dietterich,et al.  In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[7]  Jason Weston,et al.  Mismatch String Kernels for SVM Protein Classification , 2002, NIPS.

[8]  Joseph Picone,et al.  Support vector machines for speech recognition , 1998, ICSLP.

[9]  Ken-ichi Iso,et al.  Speaker-independent word recognition using a neural prediction model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Ken-ichi Iso,et al.  Speaker-independent word recognition using dynamic programming neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[11]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[12]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[13]  Mark J. F. Gales,et al.  Using SVMS and discriminative models for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Olivier Bousquet,et al.  On the Complexity of Learning the Kernel Matrix , 2002, NIPS.

[15]  Yoshua Bengio,et al.  Neural networks for speech and sequence recognition , 1996 .

[16]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[17]  Pedro J. Moreno,et al.  On the use of support vector machines for phonetic classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[18]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .