Speech Recognition in Mongolian Language using a Neural Network with pre-processing Technique

In this paper, we developed a neural network model, which is capable of recognizing a limited number of words in Mongolian language. We have chosen four words in Mongolian language. These words were chosen for further designing and creating a special device with an audio interface. In this experiment, we used audio recordings recorded in a computer using a microphone in a normal audience with minimal background noise. The database of audio recordings used to train the neural network consists of speeches of 11 people (7 men and 4 women). One of them is around 20–30 years old, three people are 60–70 and the rest are 30–40. The work uses a regular personal computer with an Intel Core i5 processor - the 3rd generation and with 8GB DDR IV RAM.

[1]  Valeri Mladenov,et al.  Neural networks used for speech recognition , 2010 .

[2]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[3]  Chin Kim On,et al.  Mel-frequency cepstral coefficient analysis in speech recognition , 2006, 2006 International Conference on Computing & Informatics.

[4]  S. Lokesh,et al.  Speech recognition system using enhanced mel frequency cepstral coefficient with windowing and framing method , 2017, Cluster Computing.

[5]  A. Shenbagavalli,et al.  Voiced/Unvoiced Detection using Short Term Processing , 2014 .

[6]  Guanglai Gao,et al.  Mongolian prosodic phrase prediction using suffix segmentation , 2016, 2016 International Conference on Asian Language Processing (IALP).

[7]  Tessamma Thomas,et al.  Text Dependent Speaker Recognition using MFCC features and BPANN , 2013 .

[8]  Guanglai Gao,et al.  Mongolian Text-to-Speech System Based on Deep Neural Network , 2017 .

[9]  S.H. El-Ramly,et al.  Neural networks used for speech recognition , 2002, Proceedings of the Nineteenth National Radio Science Conference.

[10]  Hui Zhang,et al.  Improving Mongolian Phrase Break Prediction by Using Syllable and Morphological Embeddings with BiLSTM Model , 2018, INTERSPEECH.

[11]  Dong Yu,et al.  Automatic Speech Recognition: A Deep Learning Approach , 2014 .

[12]  R. Aparna,et al.  Performance Analysis of Windowing Techniques in Automatic Speech Signal Segmentation , 2015 .

[13]  Veera Ala-Keturi Speech Recognition Based on Artificial Neural Networks , 2004 .

[14]  C Bhushan,et al.  Speech Recognition Using Artificial Neural Network – A Review , 2016 .

[15]  Buket D. Barkana,et al.  Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy , 2008, SCSS.

[16]  Hui Zhang,et al.  Mongolian Speech Recognition Based on Deep Neural Networks , 2015, CCL.

[17]  T.,et al.  Training Feedforward Networks with the Marquardt Algorithm , 2004 .