On a Hybrid NN/HMM Speech Recognition System with a RNN-Based Language Model

In this paper, we present a new NN/HMM speech recognition system with a NN-base acoustic model and RNN-based language model. The employed neural-network-based acoustic model computes posteriors for states of context-dependent acoustic units. A recurrent neural network with the maximum entropy extension was used as a language model. This hybrid NN/HMM system was compared with our previous hybrid NN/HMM system equipped with a standard n-gram language model. In our experiments, we also compared it to a standard GMM/HMM system. The system performance was evaluated on the British English speech corpus and compared with some previous work.

[1]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[2]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Josef Psutka,et al.  Towards live subtitling of TV ice-hockey commentary , 2013, 2013 International Conference on Signal Processing and Multimedia Applications (SIGMAP).

[4]  Steve Renals,et al.  WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Jan Svec,et al.  Fast Phonetic/Lexical Searching in the Archives of the Czech Holocaust Testimonies: Advancing Towards the MALACH Project Visions , 2010, TSD.

[6]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7]  Jan Zelinka,et al.  On context-dependent neural networks and speaker adaptation , 2012, 2012 IEEE 11th International Conference on Signal Processing.

[8]  Guangsen Wang,et al.  Sequential Classification Criteria for NNs in Automatic Speech Recognition , 2011, INTERSPEECH.

[9]  Jan Trmal Spatio-temporal structure of feature vectors in neural network adaptation , 2012 .

[10]  Roman Grundkiewicz,et al.  Automatic Extraction of Polish Language Errors from Text Edition History , 2013, TSD.

[11]  Steve Young,et al.  The HTK book version 3.4 , 2006 .