Improved Factorization of a Connectionist Language Model for Single-Pass Real-Time Speech Recognition

Statistical Language Models are often difficult to derive because of the so-called “dimensionality curse”. Connectionist Language Models defeat this problem by utilizing a distributed word representation which is modified simultaneously as the neural network synaptic weights. This work describes certain improvements in the utilization of Connectionist Language Models for single-pass real-time speech recognition. These include comparing the word probabilities independently between the words and a novel mechanism of factorization of the lexical tree. Experiments comparing the improved model to the standard Connectionist Language Model in a Large-Vocabulary Continuous Speech Recognition (LVCSR) task show the new method obtains about a 33-fold speed increase while achieving a minimally worse word-level speech recognition performance.

[1]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[2]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[3]  Hermann Ney,et al.  Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[4]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Ryszard Gubrynowicz,et al.  User-Centered Design for a Voice Portal , 2009, Aspects of Natural Language Processing.

[6]  Alex Graves,et al.  Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.

[7]  H. Ney,et al.  Improvements in beam search for 10000-word continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[9]  Ryszard Gubrynowicz,et al.  Wizard of Oz Experiment for a Telephony-Based City Transport Dialog System , 2008 .

[10]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[11]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[12]  John Blitzer,et al.  Hierarchical Distributed Representations for Statistical Language Modeling , 2004, NIPS.

[13]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[14]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[15]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..