A stochastic segment model for phoneme-based continuous speech recognition

The authors introduce a novel approach to modeling variable-duration phonemes, called the stochastic segment model. A phoneme X is observed as a variable-length sequence of frames, where each frame is represented by a parameter vector and the length of the sequence is random. The stochastic segment model consists of (1) a time warping of the variable-length segment X into a fixed-length segment Y called a resampled segment and (2) a joint density function of the parameters of X which in this study is a Gaussian density. The segment model represents spectra/temporal structure over the entire phoneme. The model also allows the incorporation in Y of acoustic-phonetic features derived from X, in addition to the usual spectral features that have been used in hidden Markov modeling and dynamic time warping approaches to speech recognition. The authors describe the stochastic segment model, the recognition algorithm, and an iterative training algorithm for estimating segment models from continuous speech. They present several results using segment models in two speaker-dependent recognition tasks and compare the performance of the stochastic segment model to the performance of the hidden Markov models. >

[1]  Lalit R. Bahl,et al.  Continuous parameter acoustic processing for recognition of a natural speech corpus , 1981, ICASSP.

[2]  S. Roucos,et al.  Segment quantization for very-low-rate speech coding , 1982, ICASSP.

[3]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Stephen E. Levinson,et al.  A vector quantizer incorporating both LPC shape and energy , 1984, ICASSP.

[5]  D. Burton,et al.  Isolated-word speech recognition using multisection vector quantization codebooks , 1984, IEEE Trans. Acoust. Speech Signal Process..

[6]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[7]  Gary E. Kopec,et al.  Network-based isolated digit recognition using vector quantization , 1985, IEEE Trans. Acoust. Speech Signal Process..

[8]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Robert M. Gray,et al.  Matrix quantizer design for LPC speech using the generalized Llyod algorithm , 1985, IEEE Trans. Acoust. Speech Signal Process..

[10]  S. Roucos,et al.  The role of word-dependent coarticulatory effects in a phoneme-based speech recognition system , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Shozo Makino,et al.  Recognition of phonemes using time-spectrum pattern , 1986, Speech Commun..

[12]  George R. Doddington,et al.  Frame-specific statistical features for speaker independent speech recognition , 1986, IEEE Trans. Acoust. Speech Signal Process..

[13]  M. Bush,et al.  Network-based connected digit recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[14]  S. Rocous,et al.  Stochastic segment modeling using the estimate-maximize algorithm , 1988 .

[15]  Herbert Gish,et al.  Stochastic segment modelling using the estimate-maximize algorithm (speech recognition) , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[16]  Hsiao-Wuen Hon,et al.  Large-vocabulary speaker-independent continuous speech recognition using HMM , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[17]  Hy Murveit,et al.  1000-word speaker-independent continuous-speech recognition using hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.