Comprehending Real Numbers: Development of Bengali Real Number Speech Corpus

Speech recognition has received a less attention in Bengali literature due to the lack of a comprehensive dataset. In this paper, we describe the development process of the first comprehensive Bengali speech dataset on real numbers. It comprehends all the possible words that may arise in uttering any Bengali real number. The corpus has ten speakers from the different regions of Bengali native people. It comprises of more than two thousands of speech samples in a total duration of closed to four hours. We also provide a deep analysis of our corpus, highlight some of the notable features of it, and finally evaluate the performances of two of the notable Bengali speech recognizers on it.

[1]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Md Saiful Islam,et al.  A noble approach for recognizing Bangla real number automatically using CMU Sphinx4 , 2016, 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV).

[3]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[4]  Md Saiful Islam,et al.  Bengali speech recognition: A double layered LSTM-RNN approach , 2017, 2017 20th International Conference of Computer and Information Technology (ICCIT).

[5]  Firoj Alam,et al.  Development of annotated Bangla speech corpora , 2010, SLTU.

[6]  Paul Lamere,et al.  Sphinx-4: a flexible open source framework for speech recognition , 2004 .

[7]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.