Quran Reciter Identification: A Deep Learning Approach

Speech-based intelligent systems using deep learning are becoming increasingly important due to their wide range of applications in our routine life. Most of the efforts on voice signal processing are limited for the English language. However, little effort has focused on voice signal processing for the Arabic language or for the Quran, which is the central religious book of Islam. In this study, our objective is to develop a deep learning based speaker identification using Quran recitations. We propose the use of Bidirectional Long Short-Term Memory (BLSTM)– a type of Recurrent Neural Networks (RNNs), which are well known for being particularly suitable for speech modeling and processing–for the task of Quranic speaker identification. Our results show that our BLSTM-based Quranic speaker identification delivers significantly improved results compared to previous approaches and is also computationally less expensive.

[1]  Sheeraz Akram,et al.  Artificially intelligent recognition of Arabic speaker using voice print-based local features , 2016, J. Exp. Theor. Artif. Intell..

[2]  Rajib Rana,et al.  Abnormal Heartbeat Detection Using Recurrent Neural Networks , 2018, ArXiv.

[3]  Rajib Rana,et al.  Variational Autoencoders for Learning Latent Representations of Speech Emotion , 2017, INTERSPEECH.

[4]  Rui Lu BIDIRECTIONAL GRU FOR SOUND EVENT DETECTION , 2017 .

[6]  Hagen Soltau,et al.  Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.

[7]  Goutam Saha,et al.  Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition , 2012, Speech Commun..

[8]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Abderrahim Beni Hssane,et al.  Feature extraction of some Quranic recitation using Mel-Frequency Cepstral Coeficients (MFCC) , 2016, 2016 5th International Conference on Multimedia Computing and Systems (ICMCS).

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Nurul Wahidah Arshad,et al.  Makhraj Recognition for Al-Quran Recitation using MFCC , 2013 .

[12]  Longbiao Wang,et al.  Speaker Identification and Verification by Combining MFCC and Phase Information , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Rajib Rana,et al.  Cross Corpus Speech Emotion Classification- An Effective Transfer Learning Technique , 2018, ArXiv.

[14]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[15]  A. Zabidi,et al.  Pattern classification in recognizing Qalqalah Kubra pronuncation using multilayer perceptrons , 2012, 2012 International Symposium on Computer Applications and Industrial Electronics (ISCAIE).

[16]  Shahid Munir Shah,et al.  Arabic speaker identification system using combination of DWT and LPC features , 2014, 2014 International Conference on Open Source Systems & Technologies.

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Teddy Surya Gunawan,et al.  Development of Quran Reciter Identification System Using MFCC and Neural Network , 2016 .

[20]  P. Malathi,et al.  Speaker dependent speech emotion recognition using MFCC and Support Vector Machine , 2016, 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT).

[21]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[22]  Yi Wang,et al.  Speaker recognition based on MFCC and BP neural networks , 2017, 2017 28th Irish Signals and Systems Conference (ISSC).

[23]  Sohaib Ahmed,et al.  Speaker-dependent live quranic verses recitation recognition system using Sphinx-4 framework , 2014, 17th IEEE International Multi Topic Conference 2014.

[24]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[25]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[26]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Saifur Rahman,et al.  SPEAKER IDENTIFICATION USING MEL FREQUENCY CEPSTRAL COEFFICIENTS , 2004 .