Continuous speech recognition using Correlation features and structured SVM probability output

One potential area for improvement in continuous speech recognition is the modelling of phoneme transitions (not transition probabilties) arising from the non-stationarity of speech: refined models can then be used to compute probability distributions which can serve as emission probabilities for HMM-based speech recognition systems. In this paper we present our approach to improving phoneme transition modelling. Building on our previous work, we employ a phoneme partition approach (SME: start, middle, and end states) to build a structure of support vector (SV) classifiers as our main discriminative method. For the phoneme classification step, cross correlation features based on MFCC-vectors are computed and classified within the SME structure. Additionally, we make use of a special reproducing kernel build upon the correlation features, thus offering a direct integration into the SV classifiers. This paper discusses the computation of the afore-mentioned probability outputs as well as initial results using these outputs as emission probabilities in HMMs representing phonemes, applied within a standard speech recognition system.

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[2]  John McDonough,et al.  Distant Speech Recognition , 2009 .

[3]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[4]  Gert Cauwenberghs,et al.  Forward-Decoding Kernel-Based Phone Recognition , 2002, NIPS.

[5]  Ángel García-Crespo,et al.  Speech/Speaker Recognition Using a HMM/GMM Hybrid Model , 1997, AVBPA.

[6]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[7]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[8]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[9]  Mark J. F. Gales,et al.  Using SVMS and discriminative models for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Shigeo Abe Support Vector Machines for Pattern Classification , 2010, Advances in Pattern Recognition.

[11]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[12]  Andreas Wendemuth,et al.  Mixture of Support Vector Machines for HMM based Speech Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[13]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[14]  Shigeo Abe,et al.  Support Vector Machines for Pattern Classification (Advances in Pattern Recognition) , 2005 .

[15]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[16]  Andreas Wendemuth,et al.  Speech recognition with support vector machines in a hybrid system , 2005, INTERSPEECH.

[17]  William M. Campbell,et al.  A Sequence Kernel and its Application to Speaker Recognition , 2001, NIPS.

[18]  Dietrich Klakow,et al.  Correlation Features and a Linear Transform Specific Reproducing Kernel , 2010, TSD.

[19]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[20]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[21]  John E. McCarthy,et al.  Pick Interpolation and Hilbert Function Spaces , 2002 .

[22]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.