Semantic word embedding neural network language models for automatic speech recognition

Semantic word embeddings have become increasingly important in natural language processing tasks over the last few years. This popularity is due to their ability to easily capture rich semantic information through a distributed representation and the availability of fast and scalable algorithms for learning them from large text corpora. State-of-the-art neural network language models (NNLMs) used in automatic speech recognition (ASR) and natural language processing also learn word embeddings optimized to model local N-gram dependencies given training text but are not optimized to capture semantic information. We hypothesize that semantic word embeddings provide diverse information compared to the word embeddings learned by NNLMs. We propose novel feedforward NNLM architectures that incorporate semantic word embeddings. We apply the resulting NNLMs to ASR on broadcast news and show improvements in both perplexity and word error rate.

[1]  Ebru Arisoy,et al.  Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition , 2013, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Hermann Ney,et al.  Lattice decoding and rescoring with long-Span neural network language models , 2014, INTERSPEECH.

[3]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[4]  Yoshua Bengio,et al.  BilBOWA: Fast Bilingual Distributed Representations without Word Alignments , 2014, ICML.

[5]  Andrew W. Senior,et al.  Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition , 2014, ArXiv.

[6]  Tara N. Sainath,et al.  Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Tara N. Sainath,et al.  Improvements to Deep Convolutional Neural Networks for LVCSR , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[8]  Ebru Arisoy,et al.  Bidirectional recurrent neural network language models for automatic speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[10]  Joseph P. Turian A preliminary evaluation of word representations for named-entity recognition , 2009 .

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[13]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[14]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[15]  George Saon,et al.  The IBM 2015 English conversational telephone speech recognition system , 2015, INTERSPEECH.

[16]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[17]  Mark J. F. Gales,et al.  Improving the training and evaluation efficiency of recurrent neural network language models , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Kevin Gimpel,et al.  Tailoring Continuous Word Representations for Dependency Parsing , 2014, ACL.

[19]  Lukás Burget,et al.  Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[20]  Geoffrey Zweig,et al.  Advances in speech transcription at IBM under the DARPA EARS program , 2006, IEEE Transactions on Audio, Speech, and Language Processing.