Infinite support vector machines in speech recognition

Generative feature spaces provide an elegant way to apply discriminative models in speech recognition, and system performance has been improved by adapting this framework. However, the classes in the feature space may be not linearly separable. Applying a linear classifier then limits performance. Instead of a single classifier, this paper applies a mixture of experts. This model trains different classifiers as experts focusing on different regions of the feature space. However, the number of experts is not known in advance. This problem can be bypassed by employing a Bayesian non-parametric model. In this paper, a specific mixture of experts based on the Dirichlet process, namely the infinite support vector machine, is studied. Experiments conducted on the noise-corrupted continuous digit task AURORA 2 show the advantages of this Bayesian nonparametric approach. Index Terms: generative feature space, Bayesian nonparametric, Dirichlet process, mixture of experts, infinite support vector machines

[1]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[2]  Mark J. F. Gales,et al.  Speech Recognition using SVMs , 2001, NIPS.

[3]  Michael I. Jordan,et al.  A Sticky HDP-HMM With Application to Speaker Diarization , 2009, 0905.2592.

[4]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[5]  Ning Chen,et al.  Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines , 2011, ICML.

[6]  Mark Gales,et al.  Structured Discriminative Models For Speech Recognition: An Overview , 2012, IEEE Signal Processing Magazine.

[7]  Mark J. F. Gales,et al.  Discriminative classifiers with adaptive kernels for noise robust speech recognition , 2010, Comput. Speech Lang..

[8]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[9]  Yee Whye Teh,et al.  Bayesian Nonparametric Models , 2010, Encyclopedia of Machine Learning.

[10]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[11]  Mark J. F. Gales,et al.  Structured SVMs for Automatic Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Yee Whye Teh,et al.  Dirichlet Process , 2017, Encyclopedia of Machine Learning and Data Mining.

[13]  Mark J. F. Gales,et al.  Derivative kernels for noise robust ASR , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[14]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[15]  Monte Carlo Integration Markov Chain Monte Carlo and Gibbs Sampling , 2002 .

[16]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[17]  Mark J. F. Gales,et al.  The Application of Hidden Markov Models in Speech Recognition , 2007, Found. Trends Signal Process..

[18]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[19]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[20]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[21]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[22]  Erik B. Sudderth Graphical models for visual object recognition and tracking , 2006 .

[23]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[24]  Mark J. F. Gales,et al.  Structured Log Linear Models for Noise Robust Speech Recognition , 2010, IEEE Signal Processing Letters.

[25]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .