Fast Gaussian likelihood computation by maximum probability increase estimation for continuous speech recognition

Speech signals are semi-stationary and speech features in neighboring frames are likely to share similar Gaussian distributions. A fast Gaussian computation algorithm is hence proposed to speed up the computation of the N-best posterior probabilities based on a large set of Gaussian distributions for the task of large vocabulary continuous speech recognition. The maximum probability increase between the current speech frame and a previous reference frame is estimated for all Gaussian distributions in order to reduce explicit computations of posteriors for a large number of Gaussians. The method was applied to the fMPE front-end of IBM's state-of-the-art speech recognizer resulting a decoding speed-up of 40% in probability computation for a loss-less mode and more than 55% in an approximated implementation, respectively.

[1]  Geoffrey Zweig,et al.  fMPE: discriminatively trained features for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Geoffrey Zweig,et al.  The IBM 2006 Gale Arabic ASR System , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Alexander I. Rudnicky,et al.  On improvements to CI-based GMM selection , 2005, INTERSPEECH.

[4]  Wonyong Sung,et al.  Mobile CPU Based Optimization of Fast Likelihood Computation for Continuous Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[6]  J.H.L. Hansen,et al.  Fast likelihood computation techniques in nearest-neighbor based search for continuous speech recognition , 2001, IEEE Signal Processing Letters.

[7]  Peter Beyerlein,et al.  Hamming distance approximation for a fast log-likelihood computation for mixture densities , 1995, EUROSPEECH.

[8]  Hermann Ney,et al.  Fast likelihood computation methods for continuous mixture densities in large vocabulary speech recognition , 1997, EUROSPEECH.

[9]  Pietro Laface,et al.  Analysis and improvement of the partial distance search algorithm , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Frank Seide,et al.  Fast likelihood computation for continuous-mixture densities using a tree-based nearest neighbor search , 1995, EUROSPEECH.