Stochastic segment modelling using the estimate-maximize algorithm (speech recognition)

A probabilistic model called the stochastic segment model is introduced that describes the statistical dependence of all the frames of a speech segment. The model uses a time-warping transformation to map the sequence of observed frames to the appropriate frames of the segment model. The joint density of the observed frames is then given by the joint density of the selected model frames. The automatic training and recognition algorithms are discussed and a few preliminary recognition results are presented.<<ETX>>

[1]  C. J. Wellekens,et al.  Explicit time correlation in hidden Markov models for speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3]  Aaron E. Rosenberg,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  S. Roucos,et al.  A stochastic segment model for phoneme-based continuous speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Mari Ostendorf,et al.  A stochastic segment model for phoneme-based continuous speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  H. Hartley Maximum Likelihood Estimation from Incomplete Data , 1958 .