Robust ASR using Support Vector Machines

The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units. In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM-SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841-1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.

[1]  Samy Bengio,et al.  Client Dependent GMM-SVM Models for Speaker Verification , 2003, ICANN.

[2]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[3]  B. Yegnanarayana,et al.  Combining evidence from multiple classifiers for recognition of consonant-vowel units of speech in multiple languages , 2005, Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, 2005..

[4]  John Cotton,et al.  Introductory statistics. 3rd ed. , 1978 .

[5]  Marco Gori,et al.  A survey of hybrid ANN/HMM models for automatic speech recognition , 2001, Neurocomputing.

[6]  Hui Jiang,et al.  Large margin hidden Markov models for speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  G. Ruske,et al.  A hybrid RBF-HMM system for continuous speech recognition , 1995 .

[8]  Koby Crammer,et al.  Advances in Neural Information Processing Systems 14 , 2002 .

[9]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[10]  Carmen Peláez-Moreno,et al.  Band-pass filtering of the time sequences of spectral parameters for robust wireless speech recognition , 2006, Speech Commun..

[11]  Yoshua Bengio,et al.  Neural networks for speech and sequence recognition , 1996 .

[12]  Gerhard Rigoll,et al.  A hybrid SVM/HMM acoustic modeling approach to automatic speech recognition , 2004, INTERSPEECH.

[13]  Mark J. F. Gales,et al.  Using SVMS and discriminative models for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Steve Renals,et al.  THE USE OF RECURRENT NEURAL NETWORKS IN CONTINUOUS SPEECH RECOGNITION , 1996 .

[16]  S. Levinson,et al.  Considerations in dynamic time warping algorithms for discrete word recognition , 1978 .

[17]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[18]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[19]  Stephen A. McGuire,et al.  Introductory Statistics , 2007, Technometrics.

[20]  Fernando Pérez-Cruz,et al.  SVM classifiers for ASR: A discussion about parameterization , 2004, 2004 12th European Signal Processing Conference.

[21]  Mahesan Niranjan,et al.  Data-dependent kernels in svm classification of speech patterns , 2000, INTERSPEECH.

[22]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[23]  Shigeki Sagayama,et al.  Dynamic Time-Alignment Kernel in Support Vector Machine , 2001, NIPS.

[24]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[25]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[26]  Ken-ichi Iso,et al.  Speaker-independent word recognition using dynamic programming neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[27]  Alex Waibel,et al.  Continuous speech recognition using linked predictive neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[28]  Joseph Picone,et al.  Applications of support vector machines to speech recognition , 2004, IEEE Transactions on Signal Processing.

[29]  Marcos Faundez-Zanuy Nonlinear Analyses and Algorithms for Speech Processing, International Conference on Non-Linear Speech Processing, NOLISP 2005, Barcelona, Spain, April 19-22, 2005, Revised Selected Papers , 2006, NOLISP.

[30]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[31]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[32]  Joseph Picone,et al.  A sparse modeling approach to speech recognition based on relevance vector machines , 2002, INTERSPEECH.

[33]  Fernando Pérez-Cruz,et al.  Weighted least squares training of support vector classifiers leading to compact and adaptive schemes , 2001, IEEE Trans. Neural Networks.

[34]  Carmen Peláez-Moreno,et al.  A Speech Recognizer Based on Multiclass SVMs with HMM-Guided Segmentation , 2005, NOLISP.

[35]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[36]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[37]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[38]  A. Benyettou,et al.  Lagrangian support vector machines for phoneme classification , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[39]  Steve Renals,et al.  SVMSVM: support vector machine speaker verification methodology , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[40]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[41]  Joseph Picone,et al.  Support vector machines for speech recognition , 1998, ICSLP.

[42]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[43]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[44]  Shigeki Sagayama,et al.  Support vector machine with dynamic time-alignment kernel for speech recognition , 2001, INTERSPEECH.

[45]  Shai Fine,et al.  A hybrid GMM/SVM approach to speaker identification , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[46]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[47]  Ken-ichi Iso,et al.  Speaker-independent word recognition using a neural prediction model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[48]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[49]  Joseph Picone,et al.  Hybrid SVM/HMM architectures for speech recognition , 2000, INTERSPEECH.

[50]  Pedro J. Moreno,et al.  On the use of support vector machines for phonetic classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[51]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[52]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[53]  Mark A. Randolph,et al.  A support vector machines-based rejection technique for speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[54]  J. Picone,et al.  Advances in Speech Recognition Using Sparse Bayesian Methods , 1993 .

[55]  Boonserm Kijsirikul,et al.  Support Vector Machines for Thai Phoneme Recognition , 2001, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[56]  Mark J. F. Gales,et al.  Speech Recognition using SVMs , 2001, NIPS.