Exploiting correlations among competing models with application to large vocabulary speech recognition

In a typical speech recognition system, computing the match between an incoming acoustic string and many competing models is computationally expensive. Once the highest ranking models are identified, all other match scores are discarded. The authors propose to make use of all computed scores by means of statistical inference. They view the match between an incoming acoustic string s and a model M/sub i/ as a random variable Y/sub i/. The class-conditioning distributions of (Y/sub 1/,. . .Y/sub N/) can be studied offline by sampling, and then used in a variety of ways. For example, the means of these distributions give rise to a natural measure of distance between models. One of the most useful applications of these distributions is as a basis for a new Bayesian classifier. The latter can be used to significantly reduce search effort in large vocabularies, and to quickly obtain a short list of candidate words. An example hidden Markov model (HMM)-based system shows promising results.<<ETX>>

[1]  Dimitri Kanevsky,et al.  Constructing groups of acoustically confusable words , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[3]  L. R. Rabiner,et al.  A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[4]  N. Dixon,et al.  A hierarchical decision approach to large-vocabulary discrete utterance recognition , 1983 .

[5]  P. D'Orta,et al.  Phoneme classification for real time speech recognition of Italian , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[7]  Lalit R. Bahl,et al.  A new algorithm for the estimation of hidden Markov model parameters , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[8]  Xavier L. Aubert Fast look-ahead pruning strategies in continuous speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[9]  Xuedong Huang,et al.  On semi-continuous hidden Markov modeling , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[11]  Francisco Casacuberta,et al.  On the use of a metric-space search algorithm (AESA) for fast DTW-based recognition of isolated words , 1988, IEEE Trans. Acoust. Speech Signal Process..

[12]  Lalit R. Bahl,et al.  Obtaining candidate words by polling in a large vocabulary speech recognition system , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[13]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[14]  Francisco Casacuberta,et al.  On the metric properties of dynamic time warping , 1987, IEEE Trans. Acoust. Speech Signal Process..

[15]  Dimitri Kanevsky,et al.  Matrix fast match: a fast method for identifying a short list of candidate words for decoding , 1989, International Conference on Acoustics, Speech, and Signal Processing,.