论文信息 - On a Hybrid NN/HMM Speech Recognition System with a RNN-Based Language Model

On a Hybrid NN/HMM Speech Recognition System with a RNN-Based Language Model

In this paper, we present a new NN/HMM speech recognition system with a NN-base acoustic model and RNN-based language model. The employed neural-network-based acoustic model computes posteriors for states of context-dependent acoustic units. A recurrent neural network with the maximum entropy extension was used as a language model. This hybrid NN/HMM system was compared with our previous hybrid NN/HMM system equipped with a standard n-gram language model. In our experiments, we also compared it to a standard GMM/HMM system. The system performance was evaluated on the British English speech corpus and compared with some previous work.

Jan Zelinka | Ludek Müller | Daniel Soutner

[1] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[2] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3] Josef Psutka,et al. Towards live subtitling of TV ice-hockey commentary , 2013, 2013 International Conference on Signal Processing and Multimedia Applications (SIGMAP).

[4] Steve Renals,et al. WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5] Jan Svec,et al. Fast Phonetic/Lexical Searching in the Archives of the Czech Holocaust Testimonies: Advancing Towards the MALACH Project Visions , 2010, TSD.

[6] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7] Jan Zelinka,et al. On context-dependent neural networks and speaker adaptation , 2012, 2012 IEEE 11th International Conference on Signal Processing.

[8] Guangsen Wang,et al. Sequential Classification Criteria for NNs in Automatic Speech Recognition , 2011, INTERSPEECH.

[9] Jan Trmal. Spatio-temporal structure of feature vectors in neural network adaptation , 2012 .

[10] Roman Grundkiewicz,et al. Automatic Extraction of Polish Language Errors from Text Edition History , 2013, TSD.

[11] Steve Young,et al. The HTK book version 3.4 , 2006 .