Pashto isolated digits recognition using deep convolutional neural network

Speech recognition has become one of the most significant parts of human-computer interaction due to emergence of new technologies such as smartphone, smart watch and many modern technologies, therefore the need of an ASR for local languages is felt. The basic aim of this paper is to develop an isolated digits recognition for Pashto language, using deep CNN. The database of Pashto digits from 0 to 9 with 50 utterance for each digits is used. Twenty MFCC features extracted for each isolated digit and fed as input to CNN. The network has been used for the proposed system is deep up to 4 convolutional layers, followed by ReLU and max-pooling layers. The network has been trained on the 50% of data and the rest of the data was used for testing. The total average of 84.17% accuracy was achieved for testing which show 7.32% better performance as compared to existing similar works.

[1]  Namrata Dave,et al.  Feature Extraction Methods LPC, PLP and MFCC In Speech Recognition , 2013 .

[2]  Christian Raymond,et al.  Bi-directional recurrent end-to-end neural network classifier for spoken Arab digit recognition , 2018, 2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP).

[3]  Shashidhar G. Koolagudi,et al.  SVM Scheme for Speech Emotion Recognition using MFCC Feature , 2013 .

[4]  Takio Kurita,et al.  Improvement of learning for CNN with ReLU activation by sparse regularization , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[5]  Khalid Iqbal,et al.  Automatic Speech Recognition of Urdu Digits with Optimal Classification Approach , 2015 .

[6]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[7]  B. A. Tanawala,et al.  Comparative Study of MFCC AndLPC Algorithms for Gujrati Isolated WordRecognition , 2015 .

[8]  Nadir Farah,et al.  Probabilistic classification based on Gaussian copula for speech recognition: Application to Spoken Arabic digits , 2013, 2013 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA).

[9]  Arbab Waseem Abbas,et al.  Database development and automatic speech recognition of isolated Pashto spoken digits using MFCC and K-NN , 2015, Int. J. Speech Technol..

[10]  Naveed Sarfraz Khattak,et al.  Speaker Independent Urdu speech recognition using HMM , 2010, 2010 The 7th International Conference on Informatics and Systems (INFOS).

[11]  Ghulam Muhammad,et al.  Automatic speech recognition for Bangla digits , 2009, 2009 12th International Conference on Computers and Information Technology.

[12]  Zahid Ullah,et al.  KNN and ANN-based Recognition of Handwritten Pashto Letters using Zoning Features , 2018 .

[13]  Agha Ali Raza,et al.  Design and development of phonetically rich Urdu speech corpus , 2009, 2009 Oriental COCOSDA International Conference on Speech Database and Assessments.

[14]  Dianne Easterling,et al.  March , 1890, The Hospital.

[15]  Virender Kadyan,et al.  Punjabi Automatic Speech Recognition Using HTK , 2012 .

[16]  Monica R. Mundada,et al.  Implementation of Marathi Language Speech Databases for Large Dictionary , 2015 .

[17]  Gerald Penn,et al.  Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Hafizah Husain,et al.  Mel frequency cepstral coefficients (Mfcc) feature extraction enhancement in the application of speech recognition: A comparison study , 2015 .

[19]  Aditya Sharma,et al.  Hybrid wavelet based LPC features for Hindi speech recognition , 2008, Int. J. Inf. Commun. Technol..

[20]  Shivesh Ranjan,et al.  Exploring the Discrete Wavelet Transform as a Tool for Hindi Speech Recognition , 2010 .

[21]  Parminder Singh,et al.  Speech Recognition of Punjabi Numerals using Synergic HMM and DTW Approach , 2015 .

[22]  Tufail Muhammad,et al.  ARTIFICIAL NEURAL NETWORK-BASED SPEECH RECOGNITION USING DWT ANALYSIS APPLIED ON ISOLATED WORDS FROM ORIENTAL LANGUAGES , 2015 .

[23]  Agha Ali Raza,et al.  An ASR System for Spontaneous Urdu Speech , 2010 .

[24]  Zhenan Sun,et al.  A Lightened CNN for Deep Face Representation , 2015, ArXiv.

[25]  Ying Zhang,et al.  Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks , 2016, INTERSPEECH.

[26]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  Zahid Halim,et al.  ASCII based GUI system for arabic scripted languages: a case of urdu , 2014, Int. Arab J. Inf. Technol..

[28]  Shweta Sinha,et al.  Continuous Density Hidden Markov Model for Hindi Speech Recognition , 2013 .

[29]  Sarmad Hussain,et al.  Urdu speech recognition system for district names of Pakistan: Development, challenges and solutions , 2016, 2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA).

[30]  Muhammad Tariq,et al.  Pashto spoken digits recognition using spectral and prosodic based feature extraction , 2017, 2017 Ninth International Conference on Advanced Computational Intelligence (ICACI).

[31]  Kuldeep Kumar,et al.  A Hindi speech recognition system for connected words using HTK , 2012 .

[33]  I. Elamvazuthi,et al.  Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.

[34]  Urmila Shrawankar,et al.  Techniques for Feature Extraction In Speech Recognition System : A Comparative Study , 2013, ArXiv.

[35]  Mohamed Hassine,et al.  Hybrid techniques for Arabic letter recognition , 2015 .