Kernelized log linear models for continuous speech recognition

Large margin criteria and discriminative models are two effective improvements for HMM-based speech recognition. This paper proposed a large margin trained log linear model with kernels for CSR. To avoid explicitly computing in the high dimensional feature space and to achieve the nonlinear decision boundaries, a kernel based training and decoding framework is proposed in this work. To make the system robust to noise a kernel adaptation scheme is also presented. Previous work in this area is extended in two directions. First, most kernels for CSR focus on measuring the similarity between two observation sequences. The proposed joint kernels defined a similarity between two observation-label sequence pairs on the sentence level. Second, this paper addresses how to efficiently employ kernels in large margin training and decoding with lattices. To the best of our knowledge, this is the first attempt at using large margin kernel-based log linear models for CSR. The model is evaluated on a noise corrupted continuous digit task: AURORA 2.0.

[1]  Mark J. F. Gales,et al.  Derivative kernels for noise robust ASR , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[2]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[4]  Mark J. F. Gales,et al.  Structured Support Vector Machines for Noise Robust Continuous Speech Recognition , 2011, INTERSPEECH.

[5]  Bernhard Schölkopf,et al.  Joint Kernel Maps , 2005, IWANN.

[6]  Mark J. F. Gales,et al.  Discriminative classifiers with adaptive kernels for noise robust speech recognition , 2010, Comput. Speech Lang..

[7]  Hank Liao,et al.  Joint uncertainty decoding for robust large vocabulary speech recognition , 2006 .

[8]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[9]  Mark J. F. Gales,et al.  Structured Log Linear Models for Noise Robust Speech Recognition , 2010, IEEE Signal Processing Letters.

[10]  Mark J. F. Gales,et al.  Augmented Statistical Models for Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Mark J. F. Gales,et al.  Structured discriminative models for speech recognition , 2012, MLSLP.

[12]  Mark J. F. Gales,et al.  Structured SVMs for Automatic Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Georg Heigold,et al.  Investigations on features for log-linear acoustic models in continuous speech recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[14]  Geoffrey Zweig,et al.  Speech recognitionwith segmental conditional random fields: A summary of the JHU CLSP 2010 Summer Workshop , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Geoffrey Zweig,et al.  A segmental CRF approach to large vocabulary continuous speech recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[16]  Tomoko Matsui,et al.  Isolated-Word Recognition with Penalized Logistic Regression Machines , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[17]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[18]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[19]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[20]  Mark Gales,et al.  Structured Discriminative Models For Speech Recognition: An Overview , 2012, IEEE Signal Processing Magazine.

[21]  Hermann Ney,et al.  Subspace pursuit method for kernel-log-linear models , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[23]  Tetsunori Kobayashi,et al.  A Sequential Pattern Classifier Based on Hidden Markov Kernel Machine and Its Application to Phoneme Classification , 2010, IEEE Journal of Selected Topics in Signal Processing.