Joint state and parameter estimation for a target-directed nonlinear dynamic system model

We present a new approach to joint state and parameter estimation for a target-directed, nonlinear dynamic system model with switching states. The model, recently proposed for representing speech dynamics, is called the hidden dynamic model (HDM). The model parameters, subject to statistical estimation, consist of the target vector and the system matrix (also called "time-constants"), as well as parameters characterizing the nonlinear mapping from the hidden state to the observation. We implement these parameters as the weights of a three-layer feedforward multilayer perceptron (MLP) network. The new estimation approach is based on the extended Kalman filter (EKF), and its performance is compared with the traditional expectation-maximization (EM) based approach. Extensive simulation results are presented using both approaches and under typical HDM speech modeling conditions. The EKF-based algorithm demonstrates superior convergence performance compared with the EM algorithm, but the former suffers from excessive computational loads when adopted for training the MLP weights. In all cases, the simulated model output converges to the given observation sequence. However, only in the case where the MLP weights or the target vector are assumed known do the time-constant parameters converge to their true values. We also show that the MLP weights never converge to their true values, thus demonstrating the many-to-one mapping property of the feedforward MLP. We conclude that, for the system to be identifiable, restrictions on the parameter space are needed.

[1]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[2]  Dan Nettleton,et al.  Convergence properties of the EM algorithm in constrained parameter spaces , 1999 .

[3]  Li Deng,et al.  An EKF-based algorithm for learning statistical hidden dynamic model parameters for phonetic recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  L Deng,et al.  Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics. , 2000, The Journal of the Acoustical Society of America.

[5]  John S. Bridle,et al.  The HDM: a segmental hidden dynamic model of coarticulation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  T. Westerlund,et al.  Remarks on "Asymptotic behavior of the extended Kalman filter as a parameter estimator for linear systems" , 1980 .

[7]  Lee A. Feldkamp,et al.  Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks , 1994, IEEE Trans. Neural Networks.

[8]  Li Deng,et al.  Initial evaluation of hidden dynamic models on conversational speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[9]  Mari Ostendorf,et al.  ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition , 1993, IEEE Trans. Speech Audio Process..

[10]  K. Lange A gradient algorithm locally equivalent to the EM algorithm , 1995 .

[11]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[12]  E. Stear,et al.  The simultaneous on-line estimation of parameters and states in linear systems , 1976 .

[13]  R. Shumway,et al.  Dynamic linear models with switching , 1991 .

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  Guanrong Chen,et al.  Extended Kalman Filter and System Identification , 1991 .

[16]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[17]  Jing Huang,et al.  Multistage coarticulation model combining articulatory, formant and cepstral features , 2000, INTERSPEECH.

[18]  Nando de Freitas,et al.  The Unscented Particle Filter , 2000, NIPS.

[19]  Li Deng,et al.  Speech recognition using the atomic speech units constructed from overlapping articulatory features , 1994, EUROSPEECH.

[20]  G. V. Puskorius,et al.  A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification , 1998, Proc. IEEE.

[21]  J. Mendel Lessons in Estimation Theory for Signal Processing, Communications, and Control , 1995 .

[22]  Xuemin Shen,et al.  Maximum likelihood in statistical estimation of dynamic systems: Decomposition algorithm and simulation results , 1997, Signal Process..

[23]  Li Deng,et al.  A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition , 1998, Speech Commun..

[24]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[25]  R. Kashyap Maximum likelihood identification of stochastic linear systems , 1970 .

[26]  P. Kumar,et al.  Theory and practice of recursive identification , 1985, IEEE Transactions on Automatic Control.

[27]  Li Deng,et al.  Parameter estimation of a target-directed dynamic system model with switching states , 2001, Signal Process..

[28]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[29]  Li Deng,et al.  A statistical coarticulatory model for the hidden vocal-tract-resonance dynamics , 1999, EUROSPEECH.

[30]  Rudolph van der Merwe,et al.  The unscented Kalman filter for nonlinear estimation , 2000, Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373).

[31]  Guanrong Chen,et al.  Kalman Filtering with Real-time Applications , 1987 .

[32]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[33]  Rudolph van der Merwe,et al.  Efficient derivative-free Kalman filters for online learning , 2001, ESANN.

[34]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[35]  Raman K. Mehra,et al.  Approaches to adaptive filtering , 1970 .

[36]  Li Deng,et al.  A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamic model of speech , 2000, Comput. Speech Lang..

[37]  Z. Ma,et al.  Spontaneous speech recognition using statistical dynamic models for the vocal - tract - resonance dy , 2000 .

[38]  Neil J. Gordon,et al.  Editors: Sequential Monte Carlo Methods in Practice , 2001 .