Phoneme Classification using Constrained Variational Gaussian Process Dynamical System

For phoneme classification, this paper describes an acoustic model based on the variational Gaussian process dynamical system (VGPDS). The nonlinear and non-parametric acoustic model is adopted to overcome the limitations of classical hidden Markov models (HMMs) in modeling speech. The Gaussian process prior on the dynamics and emission functions respectively enable the complex dynamic structure and long-range dependency of speech to be better represented than that by an HMM. In addition, a variance constraint in the VGPDS is introduced to eliminate the sparse approximation error in the kernel matrix. The effectiveness of the proposed model is demonstrated with three experimental results, including parameter estimation and classification performance, on the synthetic and benchmark datasets.

[1]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[2]  Dong Yu,et al.  Structured speech modeling , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[4]  Neil D. Lawrence,et al.  Learning for Larger Datasets with the Gaussian Process Latent Variable Model , 2007, AISTATS.

[5]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[6]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[7]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[8]  Neil D. Lawrence,et al.  Variational Gaussian Process Dynamical Systems , 2011, NIPS.

[9]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[10]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[11]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.