论文信息 - Fast Gaussian likelihood computation by maximum probability increase estimation for continuous speech recognition

Fast Gaussian likelihood computation by maximum probability increase estimation for continuous speech recognition

Speech signals are semi-stationary and speech features in neighboring frames are likely to share similar Gaussian distributions. A fast Gaussian computation algorithm is hence proposed to speed up the computation of the N-best posterior probabilities based on a large set of Gaussian distributions for the task of large vocabulary continuous speech recognition. The maximum probability increase between the current speech frame and a previous reference frame is estimated for all Gaussian distributions in order to reduce explicit computations of posteriors for a large number of Gaussians. The method was applied to the fMPE front-end of IBM's state-of-the-art speech recognizer resulting a decoding speed-up of 40% in probability computation for a loss-less mode and more than 55% in an approximated implementation, respectively.

Liang Gu | Yuqing Gao | Nicolás Morales

[1] Geoffrey Zweig,et al. fMPE: discriminatively trained features for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2] Geoffrey Zweig,et al. The IBM 2006 Gale Arabic ASR System , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3] Alexander I. Rudnicky,et al. On improvements to CI-based GMM selection , 2005, INTERSPEECH.

[4] Wonyong Sung,et al. Mobile CPU Based Optimization of Fast Likelihood Computation for Continuous Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[6] J.H.L. Hansen,et al. Fast likelihood computation techniques in nearest-neighbor based search for continuous speech recognition , 2001, IEEE Signal Processing Letters.

[7] Peter Beyerlein,et al. Hamming distance approximation for a fast log-likelihood computation for mixture densities , 1995, EUROSPEECH.

[8] Hermann Ney,et al. Fast likelihood computation methods for continuous mixture densities in large vocabulary speech recognition , 1997, EUROSPEECH.

[9] Pietro Laface,et al. Analysis and improvement of the partial distance search algorithm , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10] Enrico Bocchieri,et al. Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11] Daniel Povey,et al. Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12] Frank Seide,et al. Fast likelihood computation for continuous-mixture densities using a tree-based nearest neighbor search , 1995, EUROSPEECH.