论文信息 - The acoustic-modeling problem in automatic speech recognition

The acoustic-modeling problem in automatic speech recognition

Abstract : This thesis examines the acoustic-modeling problem in automatic speech recognition from an information-theoretic point of view. This problem is to design a speech-recognition system which can extract from the speech waveform as much information as possible about the corresponding word sequence. The information extraction process is broken down into two steps: a signal processing step which converts a speech waveform into a sequence of information bearing acoustic feature vectors, and a step which models such a sequence. This thesis is primarily concerned with the use of hidden Markov models to model sequences of feature vectors which lie in a continuous space such as R sub N. It explores the trade-off between packing a lot of information into such sequences and being able to model them accurately. The difficulty of developing accurate models of continuous parameter sequences is addressed by investigating a method of parameter estimation which is specifically designed to cope with inaccurate modeling assumptions.

Peter F. Brown | P. Brown

[1] C. E. SHANNON,et al. A mathematical theory of communication , 1948, MOCO.

[2] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[3] H. P. Friedman,et al. On Some Invariant Criteria for Grouping Data , 1967 .

[4] L. Baum,et al. An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[5] J. Makhoul,et al. Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[6] F. Itakura,et al. Minimum prediction residual principle applied to speech recognition , 1975 .

[7] R. Bakis. Continuous speech recognition via centisecond acoustic states , 1976 .

[8] Bruce T. Lowerre,et al. The HARPY speech recognition system , 1976 .

[9] Harvey F. Silverman,et al. A general language-operated decision implementation system (GLODIS): Its application to continuous-speech segmentation , 1976 .

[10] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11] S. Levinson,et al. Speaker‐independent recognition of isolated words using clustering techniques , 1978 .

[12] L. R. Bahl. Language-model/acoustic channel balance mechanism , 1980 .

[13] Frederick Jelinek,et al. Interpolated estimation of Markov source parameters from sparse data , 1980 .

[14] Lalit R. Bahl,et al. Continuous parameter acoustic processing for recognition of a natural speech corpus , 1981, ICASSP.

[15] Lalit R. Bahl,et al. Continuous speech recognition with automatically selected acoustic prototypes obtained by either bootstrapping or clustering , 1981, ICASSP.

[16] Lawrence R. Rabiner,et al. Isolated word recognition using a two-pass pattern recognition approach , 1981, ICASSP.

[17] Ronald A. Cole,et al. A comparison of learning techniques in speech recognition , 1982, ICASSP.

[18] Chin-Hui Lee,et al. Bayesian adaptation in speech recognition , 1983, ICASSP.

[19] Richard M. Stern,et al. Dynamic speaker adaptation for isolated letter recognition using MAP estimation , 1983, ICASSP.

[20] Lalit R. Bahl,et al. A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Alex Waibel,et al. Comparative study of nonlinear time warping techniques in isolated word speech recognition systems , 1983 .

[22] Richard M. Stern,et al. Unsupervised adaptation to new speakers in feature-based letter recognition , 1984, ICASSP.

[23] P. D. Souza,et al. 15 Speech recognition using LPC distance measures , 1985 .

[24] A. Poritz,et al. On hidden Markov models in isolated word recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26] Amir Averbuch,et al. An IBM PC based large-vocabulary isolated-utterance speech recognizer , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27] Richard M. Stern,et al. Dynamic speaker adaptation for feature-based isolated word recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[28] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[29] J R Cohen,et al. Application of an auditory model to speech recognition. , 1989, The Journal of the Acoustical Society of America.

[30] I. Meilijson. A fast improvement to the EM algorithm on its own terms , 1989 .