Time Delay Neural Network for Myanmar Automatic Speech Recognition

Time Delay Neural Network (TDNN) contains in neural network architectures. In Automatic Speech Recognition, TDNN is strong possibility in context modeling and recognizes phonemes and acoustic features, independent of position in time. There are many techniques have been applied for improving Myanmar speech processing. TDNN based acoustic model for Myanmar ASR in this paper. Myanmar language is a low resource language because there is a lack of pre-collected data. A larger dataset and lexicon is applied in this work. The speech corpus contains three domain: Names, Web News data and Daily conversational data. The size of the corpus is 77 Hrs and 2 Mins and 11 Secs and include 233 female speakers and 97 male speakers. The performance of TDNN for Myanmar ASR is shown by comparing with Gaussian Mixture Model (GMM) as a baseline system, Deep Neural Network (DNN) and Convolutional Neural Network (CNN). Experiments evaluation is used 2 test data: TestSet1, web news and TestSet2, recorded conversational data. The experimental results show that TDNN outperforms GMM-HMM, DNN and CNN.

[1]  Yoshinori Sagisaka,et al.  Developing a speech corpus from web news for Myanmar (Burmese) language , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).

[2]  Gerald Penn,et al.  Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[4]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[5]  Chiori Hori,et al.  A Myanmar large vocabulary continuous speech recognition system , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[6]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[7]  Ye Kyaw Thu,et al.  UCSY-SC1: A Myanmar speech corpus for automatic speech recognition , 2019, International Journal of Electrical and Computer Engineering (IJECE).

[8]  Irina S. Kipyatkova Experimenting with Hybrid TDNN/HMM Acoustic Models for Russian Speech Recognition , 2017, SPECOM.

[9]  Ratnadeep R. Deshmukh,et al.  A Review on Different Approaches for Speech Recognition System , 2015 .

[10]  Sumit Kumar,et al.  IMPROVED HYBRID MODEL OF HMM/GMM FOR SPEECH RECOGNITION , 2008 .

[11]  Automatic Speech Recognition using different Neural Network Architectures – A Survey , 2016 .

[12]  Ji-Hwan Kim,et al.  A Fast-Converged Acoustic Modeling for Korean Speech Recognition: A Preliminary Study on Time Delay Neural Network , 2018, ArXiv.

[13]  Li-Rong Dai,et al.  Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .