Mongolian Speech Recognition Based on Deep Neural Networks

Mongolian is an influential language. And better Mongolian Large Vocabulary Continuous Speech Recognition (LVCSR) systems are required. Recently, the research of speech recognition has achieved a big improvement by introducing the Deep Neural Networks (DNNs). In this study, a DNN-based Mongolian LVCSR system is built. Experimental results show that the DNN-based models outperform the conventional models which based on Gaussian Mixture Models (GMMs) for the Mongolian speech recognition, by a large margin. Compared with the best GMM-based model, the DNN-based one obtains a relative improvement over 50 %. And it becomes a new state-of-the-art system in this field.

[1]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[2]  Bao Feilon Research on conversion approach between traditional Mongolian and Cyrillic Mongolian , 2014 .

[3]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[4]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[5]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[6]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[7]  Guanglai Gao,et al.  Language Model for Cyrillic Mongolian to Traditional Mongolian Conversion , 2013, NLPCC.

[8]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[9]  Guanglai Gao,et al.  Segmentation-based Mongolian LVCSR approach , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[11]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[12]  Jr. G. Forney,et al.  Viterbi Algorithm , 1973, Encyclopedia of Machine Learning.

[13]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[14]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[15]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Altangerel Ayush,et al.  A design and implementation of HMM based Mongolian speech recognition system , 2013, Ifost.

[17]  Guang-Lai Gao,et al.  Researching of Speech Recognition Oriented Mongolian Acoustic Model , 2008, 2008 Chinese Conference on Pattern Recognition.

[18]  Geoffrey E. Hinton,et al.  Understanding how Deep Belief Networks perform acoustic modelling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  William Chan,et al.  Deep Recurrent Neural Networks for Acoustic Modelling , 2015, ArXiv.

[20]  Guanglai Gao,et al.  A Mongolian Speech Recognition System Based on HMM , 2006, ICIC.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Hermann Ney,et al.  Comparison of feedforward and recurrent neural network language models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Guanglai Gao,et al.  Improving of Acoustic Model for the Mongolian Speech Recognition System , 2009, 2009 Chinese Conference on Pattern Recognition.