Training Augmented Models Using SVMs

There has been significant interest in developing new forms of acoustic model, in particular models which allow additional dependencies to be represented than those contained within a standard hidden Markov model (HMM). This paper discusses one such class of models, augmented statistical models. Here, a local exponential approximation is made about some point on a base model. This allows additional dependencies within the data to be modelled than are represented in the base distribution. Augmented models based on Gaussian mixture models (GMMs) and HMMs are briefly described. These augmented models are then related to generative kernels, one approach used for allowing support vector machines (SVMs) to be applied to variable length data. The training of augmented statistical models within an SVM, generative kernel, framework is then discussed. This may be viewed as using maximum margin training to estimate statistical models. Augmented Gaussian mixture models are then evaluated using rescoring on a large vocabulary speech recognition task.

[1]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[2]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[3]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[4]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[5]  Mark J. F. Gales,et al.  Acoustic Modelling Using Continuous Rational Kernels , 2005, 2005 IEEE Workshop on Machine Learning for Signal Processing.

[6]  Shantanu Chakrabartty,et al.  Support vector machines for segmental minimum Bayes risk decoding of continuous speech , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Mark J. F. Gales,et al.  Maximum margin training of generative kernels , 2004 .

[9]  Jeff A. Bilmes,et al.  Buried Markov models: a graphical-modeling approach to automatic speech recognition , 2003, Comput. Speech Lang..

[10]  Mark J. F. Gales,et al.  Training LVCSR systems on thousands of hours of data , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  N. D. Smith,et al.  Using Augmented Statistical Models and Score Spaces for Classification , 2003 .

[12]  Harriet J. Nock,et al.  Techniques for modelling Phonological Processes in Automatic Speech Recognition , 2001 .

[13]  Mark J. F. Gales,et al.  Switching linear dynamical systems for speech recognition , 2003 .

[14]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[15]  Mahesan Niranjan,et al.  Data-dependent kernels in svm classification of speech patterns , 2000, INTERSPEECH.

[16]  Mark J. F. Gales,et al.  Speech Recognition using SVMs , 2001, NIPS.

[17]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[18]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[19]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..