The acoustic-modeling problem in automatic speech recognition

Abstract : This thesis examines the acoustic-modeling problem in automatic speech recognition from an information-theoretic point of view. This problem is to design a speech-recognition system which can extract from the speech waveform as much information as possible about the corresponding word sequence. The information extraction process is broken down into two steps: a signal processing step which converts a speech waveform into a sequence of information bearing acoustic feature vectors, and a step which models such a sequence. This thesis is primarily concerned with the use of hidden Markov models to model sequences of feature vectors which lie in a continuous space such as R sub N. It explores the trade-off between packing a lot of information into such sequences and being able to model them accurately. The difficulty of developing accurate models of continuous parameter sequences is addressed by investigating a method of parameter estimation which is specifically designed to cope with inaccurate modeling assumptions.

[1]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[2]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[3]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[4]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[5]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[6]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[7]  R. Bakis Continuous speech recognition via centisecond acoustic states , 1976 .

[8]  Bruce T. Lowerre,et al.  The HARPY speech recognition system , 1976 .

[9]  Harvey F. Silverman,et al.  A general language-operated decision implementation system (GLODIS): Its application to continuous-speech segmentation , 1976 .

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  S. Levinson,et al.  Speaker‐independent recognition of isolated words using clustering techniques , 1978 .

[12]  L. R. Bahl Language-model/acoustic channel balance mechanism , 1980 .

[13]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[14]  Lalit R. Bahl,et al.  Continuous parameter acoustic processing for recognition of a natural speech corpus , 1981, ICASSP.

[15]  Lalit R. Bahl,et al.  Continuous speech recognition with automatically selected acoustic prototypes obtained by either bootstrapping or clustering , 1981, ICASSP.

[16]  Lawrence R. Rabiner,et al.  Isolated word recognition using a two-pass pattern recognition approach , 1981, ICASSP.

[17]  Ronald A. Cole,et al.  A comparison of learning techniques in speech recognition , 1982, ICASSP.

[18]  Chin-Hui Lee,et al.  Bayesian adaptation in speech recognition , 1983, ICASSP.

[19]  Richard M. Stern,et al.  Dynamic speaker adaptation for isolated letter recognition using MAP estimation , 1983, ICASSP.

[20]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Alex Waibel,et al.  Comparative study of nonlinear time warping techniques in isolated word speech recognition systems , 1983 .

[22]  Richard M. Stern,et al.  Unsupervised adaptation to new speakers in feature-based letter recognition , 1984, ICASSP.

[23]  P. D. Souza,et al.  15 Speech recognition using LPC distance measures , 1985 .

[24]  A. Poritz,et al.  On hidden Markov models in isolated word recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Amir Averbuch,et al.  An IBM PC based large-vocabulary isolated-utterance speech recognizer , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Richard M. Stern,et al.  Dynamic speaker adaptation for feature-based isolated word recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[28]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[29]  J R Cohen,et al.  Application of an auditory model to speech recognition. , 1989, The Journal of the Acoustical Society of America.

[30]  I. Meilijson A fast improvement to the EM algorithm on its own terms , 1989 .