Long short-term memory recurrent neural network architectures for Urdu acoustic modeling

Recurrent neural networks (RNNs) have achieved remarkable improvements in acoustic modeling recently. However, the potential of RNNs have not been utilized for modeling Urdu acoustics. The connectionist temporal classification and attention based RNNs are suffered due to the unavailability of lexicon and computational cost of training, respectively. Therefore, we explored contemporary long short-term memory and gated recurrent neural networks Urdu acoustic modeling. The efficacies of plain, deep, bidirectional and deep-directional network architectures are evaluated empirically. Results indicate that deep-directional has an advantage over the other architectures. A word error rate of 20% was achieved on a hundred words dataset of twenty speakers. It shows 15% improvement over the baseline single-layer LSTMs. It has been observed that two-layer architectures can improve performance over single-layer, however the performance is degraded with further layers. LSTM architectures were compared with gated recurrent unit (GRU) based architectures and it was found that LSTM has an advantage over GRU.

[1]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[3]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[4]  Sarmad Hussain,et al.  Large vocabulary continuous speech recognition for Urdu , 2010, FIT.

[5]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[6]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[7]  Jürgen Schmidhuber,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Naveed Sarfraz Khattak,et al.  Speaker Independent Urdu speech recognition using HMM , 2010, 2010 The 7th International Conference on Informatics and Systems (INFOS).

[11]  Dong Yu,et al.  Recent progresses in deep learning based acoustic models , 2017, IEEE/CAA Journal of Automatica Sinica.

[12]  T. Mehmood,et al.  Speech recognition using multilayer perceptron , 2002, IEEE Students Conference, ISCON '02. Proceedings..

[13]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[14]  Geoffrey Zweig,et al.  Advances in all-neural speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  S. Mohsin,et al.  Urdu Spoken Digits Recognition Using Classified MFCC and Backpropgation Neural Network , 2007, Computer Graphics, Imaging and Visualisation (CGIV 2007).

[16]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[17]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[18]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Abdul Hafeez,et al.  Urdu Speech Corpus and Preliminary Results on Speech Recognition , 2016, EANN.

[21]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[23]  M.S. Awan,et al.  Recognizing spoken Urdu numbers using fourier descriptor and neural networks with Matlab , 2008, 2008 Second International Conference on Electrical Engineering.

[24]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[25]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[26]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[27]  Andrew W. Senior,et al.  Fast and accurate recurrent neural network acoustic models for speech recognition , 2015, INTERSPEECH.

[28]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[29]  Hasim Sak,et al.  Multi-accent speech recognition with hierarchical grapheme based models , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[32]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.