Performance analysis of Neural Networks in combination with n-gram language models

Neural Network language models (NNLMs) have recently become an important complement to conventional n-gram language models (LMs) in speech-to-text systems. However, little is known about the behavior of NNLMs. The analysis presented in this paper aims to understand which types of events are better modeled by NNLMs as compared to n-gram LMs, in what cases improvements are most substantial and why this is the case. Such an analysis is important to take further benefit from NNLMs used in combination with conventional n-gram models. The analysis is carried out for different types of neural network (feed-forward and recurrent) LMs. The results showing for which type of events NNLMs provide better probability estimates are validated on two setups that are different in their size and the degree of data homogeneity.

[1]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[2]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[3]  Ngoc Thang Vu,et al.  Speech recognition for machine translation in Quaero , 2011, IWSLT.

[4]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[5]  Mark J. F. Gales,et al.  Use of contexts in language model interpolation and adaptation , 2009, Comput. Speech Lang..

[6]  Tim Ng,et al.  Mandarin Word-Character Hybrid-Input Neural Network Language Model , 2011, INTERSPEECH.

[7]  Ahmad Emami,et al.  Morphological and syntactic features for Arabic speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Lukás Burget,et al.  Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[9]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[10]  Alexandre Allauzen,et al.  Structured Output Layer neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Sven C. Martin,et al.  Statistical Language Modeling Using Leaving-One-Out , 1997 .

[12]  Jean-Luc Gauvain,et al.  Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.