ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition

A nontraditional approach to the problem of estimating the parameters of a stochastic linear system is presented. The method is based on the expectation-maximization algorithm and can be considered as the continuous analog of the Baum-Welch estimation algorithm for hidden Markov models. The algorithm is used for training the parameters of a dynamical system model that is proposed for better representing the spectral dynamics of speech for recognition. It is assumed that the observed feature vectors of a phone segment are the output of a stochastic linear dynamical system, and it is shown how the evolution of the dynamics as a function of the segment length can be modeled using alternative assumptions. A phoneme classification task using the TIMIT database demonstrates that the approach is the first effective use of an explicit model for statistical dependence between frames of speech. >

[1]  H. Hartley Maximum Likelihood Estimation from Incomplete Data , 1958 .

[2]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[3]  C. Striebel,et al.  On the maximum likelihood estimates for linear dynamic systems , 1965 .

[4]  Karl Johan Åström,et al.  Numerical Identification of Linear Dynamic Systems from Normal Operating Records , 1965 .

[5]  D. Q. Mayne Parameter estimation , 1966, Autom..

[6]  K. Åström,et al.  Numerical Identification of Linear Dynamic Systems from Normal Operating Records , 1966 .

[7]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[8]  R. Kashyap Maximum likelihood identification of stochastic linear systems , 1970 .

[9]  H. Akaike Maximum likelihood identification of Gaussian autoregressive moving average models , 1973 .

[10]  R. Mehra,et al.  Computational aspects of maximum likelihood estimation and reduction in sensitivity function calculations , 1974 .

[11]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[12]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[13]  N. Sandell,et al.  MAXIMUM LIKELIHOOD IDENTIFICATION OF STATE SPACE MODELS FOR LINEAR DYNAMIC SYSTEMS , 1978 .

[14]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[15]  A. Kumar,et al.  Derivative computations for the log likelihood function , 1982 .

[16]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[17]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[18]  S. Qureshi,et al.  Adaptive equalization , 1982, Proceedings of the IEEE.

[19]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[20]  Peter F. Brown,et al.  The acoustic-modeling problem in automatic speech recognition , 1987 .

[21]  S. Rocous,et al.  Stochastic segment modeling using the estimate-maximize algorithm , 1988 .

[22]  Herbert Gish,et al.  Stochastic segment modelling using the estimate-maximize algorithm (speech recognition) , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[23]  Mari Ostendorf,et al.  Improvements in the Stochastic Segment Model for Phoneme Recognition , 1989, HLT.

[24]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[25]  James Glass,et al.  Acoustic segmentation and phonetic classification in the SUMMIT system , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[26]  Mari Ostendorf,et al.  A stochastic segment model for phoneme-based continuous speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[27]  Patrick Kenny,et al.  A linear predictive HMM for vector-valued observations with applications to speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[28]  Esther Levin,et al.  Word recognition using hidden control neural architecture , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[29]  Jerry D. Gibson,et al.  Filtering of colored noise for speech enhancement and coding , 1991, IEEE Trans. Signal Process..

[30]  Mari Ostendorf,et al.  Integration of Diverse Recognition Methodologies Through Reevaluation of N-Best Sentence Hypotheses , 1991, HLT.

[31]  J. W. Modestino,et al.  the Mean Field Theory in EM Procedures for Markov Random Fields , 1991, Proceedings. 1991 IEEE International Symposium on Information Theory.

[32]  Mari Ostendorf,et al.  A Dynamical System Approach to Continuous Speech Recognition , 1991, HLT.

[33]  Vassilios Digalakis,et al.  Segment-based stochastic models of spectral dynamics for continuous speech recognition , 1992 .

[34]  Mari Ostendorf,et al.  Fast algorithms for phone classification and recognition using segment-based models , 1992, IEEE Trans. Signal Process..