论文信息 - DNN-Based Acoustic Modeling for Russian Speech Recognition Using Kaldi

DNN-Based Acoustic Modeling for Russian Speech Recognition Using Kaldi

In the paper, we describe a research of DNN-based acoustic modeling for Russian speech recognition. Training and testing of the system was performed using the open-source Kaldi toolkit. We created tanh and p-norm DNNs with a different number of hidden layers and a different number of hidden units of tanh DNNs. Testing of the models was carried out on very large vocabulary continuous Russian speech recognition task. We obtained a relative WER reduction of 20 % comparing to the baseline GMM-HMM system.

Alexey Karpov | Irina S. Kipyatkova | Alexey Karpov | I. Kipyatkova

[1] Daniel Jurafsky,et al. Building DNN acoustic models for large vocabulary speech recognition , 2014, Comput. Speech Lang..

[2] Piero Cosi. A KALDI-DNN-based ASR system for Italian , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[3] Daniel P. W. Ellis,et al. Tandem acoustic modeling in large-vocabulary recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4] Andrey Ronzhin,et al. Large vocabulary Russian speech recognition using syntactico-statistical language modeling , 2014, Speech Commun..

[5] Jan Cernocký,et al. Probabilistic and Bottle-Neck Features for LVCSR of Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Yajie Miao,et al. Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN , 2014, ArXiv.

[8] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[9] Maxim Korenevsky,et al. Improving Acoustic Models for Russian Spontaneous Speech Recognition , 2015, SPECOM.

[10] Dong Yu,et al. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[11] Andrey Ronzhin,et al. Very Large Vocabulary ASR for Spoken Russian with Syntactic and Morphemic Analysis , 2011, INTERSPEECH.

[12] Sanjeev Khudanpur,et al. Parallel training of DNNs with Natural Gradient and Parameter Averaging , 2014 .

[13] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.

[14] Natalia A. Tomashenko,et al. Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing , 2014, INTERSPEECH.

[15] Dong Yu,et al. Automatic Speech Recognition: A Deep Learning Approach , 2014 .

[16] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[17] Alexey Karpov,et al. Lexicon Size and Language Model Order Optimization for Russian LVCSR , 2013, SPECOM.

[18] Alexey Karpov,et al. Analysis of long-distance word dependencies and pronunciation variability at conversational Russian speech recognition , 2012, 2012 Federated Conference on Computer Science and Information Systems (FedCSIS).

[19] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[20] Andreas Stolcke,et al. SRILM at Sixteen: Update and Outlook , 2011 .

[21] Vlado Delic,et al. Deep Neural Network Based Continuous Speech Recognition for Serbian Using the Kaldi Toolkit , 2015, SPECOM.

[22] Xiaohui Zhang,et al. Improving deep neural network acoustic models using generalized maxout networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).