论文信息 - Investigation on Estimation of Sentence Probability by Combining Forward, Backward and Bi-directional LSTM-RNNs

Investigation on Estimation of Sentence Probability by Combining Forward, Backward and Bi-directional LSTM-RNNs

A combination of forward and backward long short-term memory (LSTM) recurrent neural network (RNN) language models is a popular model combination approach to improve the estimation of the sequence probability in the second pass N-best list rescoring in automatic speech recognition (ASR). In this work, we further push such an idea by proposing a combination of three models: a forward LSTM language model, a backward LSTM language model and a bi-directional LSTM based gap completion model. We derive such a combination method from a forward backward decomposition of the sequence probability. We carry out experiments on the Switchboard speech recognition task. While we empirically find that such a combination gives slight improvements in perplexity over the combination of forward and backward models, we finally show that a combination of the same number of forward models gives the best perplexity and word error rate (WER) overall.

Hermann Ney | Ralf Schlüter | Kazuki Irie | Zhihong Lei | Liuhui Deng

[1] Mark J. F. Gales,et al. Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition , 2017, INTERSPEECH.

[2] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.

[4] Hermann Ney,et al. rwthlm - the RWTH aachen university neural network language modeling toolkit , 2014, INTERSPEECH.

[5] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[6] Hermann Ney,et al. Lattice decoding and rescoring with long-Span neural network language models , 2014, INTERSPEECH.

[7] Hermann Ney,et al. LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[8] Ebru Arisoy,et al. Bidirectional recurrent neural network language models for automatic speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] Bhuvana Ramabhadran,et al. Language modeling with highway LSTM , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[10] Yu Wang,et al. Future word contexts in neural network language models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[11] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[12] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[13] Hermann Ney,et al. Gammatone Features and Feature Combination for Large Vocabulary Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[14] Timothy Dozat,et al. Incorporating Nesterov Momentum into Adam , 2016 .

[15] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[16] Bhuvana Ramabhadran,et al. Whole Sentence Neural Language Models , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17] Xiaodong Cui,et al. English Conversational Telephone Speech Recognition by Humans and Machines , 2017, INTERSPEECH.

[18] Geoffrey Zweig,et al. Toward Human Parity in Conversational Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[20] Yu Zhang,et al. On training bi-directional neural network language model with noise contrastive estimation , 2016, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[21] Hermann Ney,et al. Returnn: The RWTH extensible training framework for universal recurrent neural networks , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22] Andreas Stolcke,et al. The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23] Francisco Casacuberta,et al. A Bidirectional Recurrent Neural Language Model for Machine Translation , 2015, Proces. del Leng. Natural.