论文信息 - Speech Recognition in Mongolian Language using a Neural Network with pre-processing Technique

Speech Recognition in Mongolian Language using a Neural Network with pre-processing Technique

In this paper, we developed a neural network model, which is capable of recognizing a limited number of words in Mongolian language. We have chosen four words in Mongolian language. These words were chosen for further designing and creating a special device with an audio interface. In this experiment, we used audio recordings recorded in a computer using a microphone in a normal audience with minimal background noise. The database of audio recordings used to train the neural network consists of speeches of 11 people (7 men and 4 women). One of them is around 20–30 years old, three people are 60–70 and the rest are 30–40. The work uses a regular personal computer with an Intel Core i5 processor - the 3rd generation and with 8GB DDR IV RAM.

[1] Valeri Mladenov,et al. Neural networks used for speech recognition , 2010 .

[2] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[3] Chin Kim On,et al. Mel-frequency cepstral coefficient analysis in speech recognition , 2006, 2006 International Conference on Computing & Informatics.

[4] S. Lokesh,et al. Speech recognition system using enhanced mel frequency cepstral coefficient with windowing and framing method , 2017, Cluster Computing.

[5] A. Shenbagavalli,et al. Voiced/Unvoiced Detection using Short Term Processing , 2014 .

[6] Guanglai Gao,et al. Mongolian prosodic phrase prediction using suffix segmentation , 2016, 2016 International Conference on Asian Language Processing (IALP).

[7] Tessamma Thomas,et al. Text Dependent Speaker Recognition using MFCC features and BPANN , 2013 .

[8] Guanglai Gao,et al. Mongolian Text-to-Speech System Based on Deep Neural Network , 2017 .

[9] S.H. El-Ramly,et al. Neural networks used for speech recognition , 2002, Proceedings of the Nineteenth National Radio Science Conference.

[10] Hui Zhang,et al. Improving Mongolian Phrase Break Prediction by Using Syllable and Morphological Embeddings with BiLSTM Model , 2018, INTERSPEECH.

[11] Dong Yu,et al. Automatic Speech Recognition: A Deep Learning Approach , 2014 .

[12] R. Aparna,et al. Performance Analysis of Windowing Techniques in Automatic Speech Signal Segmentation , 2015 .

[13] Veera Ala-Keturi. Speech Recognition Based on Artificial Neural Networks , 2004 .

[14] C Bhushan,et al. Speech Recognition Using Artificial Neural Network – A Review , 2016 .

[15] Buket D. Barkana,et al. Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy , 2008, SCSS.

[16] Hui Zhang,et al. Mongolian Speech Recognition Based on Deep Neural Networks , 2015, CCL.

[17] T.,et al. Training Feedforward Networks with the Marquardt Algorithm , 2004 .