论文信息 - Improved Factorization of a Connectionist Language Model for Single-Pass Real-Time Speech Recognition

Improved Factorization of a Connectionist Language Model for Single-Pass Real-Time Speech Recognition

Statistical Language Models are often difficult to derive because of the so-called “dimensionality curse”. Connectionist Language Models defeat this problem by utilizing a distributed word representation which is modified simultaneously as the neural network synaptic weights. This work describes certain improvements in the utilization of Connectionist Language Models for single-pass real-time speech recognition. These include comparing the word probabilities independently between the words and a novel mechanism of factorization of the lexical tree. Experiments comparing the improved model to the standard Connectionist Language Model in a Large-Vocabulary Continuous Speech Recognition (LVCSR) task show the new method obtains about a 33-fold speed increase while achieving a minimally worse word-level speech recognition performance.

[1] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[2] Razvan Pascanu,et al. Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[3] Hermann Ney,et al. Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[4] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5] Ryszard Gubrynowicz,et al. User-Centered Design for a Voice Portal , 2009, Aspects of Natural Language Processing.

[6] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.

[7] H. Ney,et al. Improvements in beam search for 10000-word continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] Frederick Jelinek,et al. Interpolated estimation of Markov source parameters from sparse data , 1980 .

[9] Ryszard Gubrynowicz,et al. Wizard of Oz Experiment for a Telephony-Based City Transport Dialog System , 2008 .

[10] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[11] Frederick Jelinek,et al. Statistical methods for speech recognition , 1997 .

[12] John Blitzer,et al. Hierarchical Distributed Representations for Statistical Language Modeling , 2004, NIPS.

[13] Christopher M. Bishop,et al. Neural networks for pattern recognition , 1995 .

[14] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[15] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..