Recognition of Urdu Handwritten Characters Using Convolutional Neural Network

In the area of pattern recognition and pattern matching, the methods based on deep learning models have recently attracted several researchers by achieving magnificent performance. In this paper, we propose the use of the convolutional neural network to recognize the multifont offline Urdu handwritten characters in an unconstrained environment. We also propose a novel dataset of Urdu handwritten characters since there is no publicly-available dataset of this kind. A series of experiments are performed on our proposed dataset. The accuracy achieved for character recognition is among the best while comparing with the ones reported in the literature for the same task.

[1]  C. Chui Wavelets: A Tutorial in Theory and Applications , 1992 .

[2]  Muhammad Imran Razzak,et al.  Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks , 2016, SpringerPlus.

[3]  Sarmad Hussain,et al.  Context Sensitive Shape-Substitution in Nastaliq Writing System: Analysis and Formulation , 2007 .

[4]  Imran Siddiqi,et al.  Urdu Nastaliq recognition using convolutional-recursive deep learning , 2017, Neurocomputing.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[7]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[8]  Ehud Rivlin,et al.  Offline cursive script word recognition – a survey , 1999, International Journal on Document Analysis and Recognition.

[9]  Muhammad Muzammal,et al.  Online Urdu Handwriting Recognition System Using Geometric Invariant Features , 2016 .

[10]  Nafiz Arica,et al.  An overview of character recognition focused on off-line handwriting , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[11]  A. Mahboob,et al.  Bilingual Education in India and Pakistan , 2016 .

[12]  Gurpreet Singh Lehal Choice of recognizable units for URDU OCR , 2012, DAR '12.

[13]  Hanan Samet,et al.  Efficient Component Labeling of Images of Arbitrary Dimension Represented by Linear Bintrees , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Faisal Shafait,et al.  A segmentation-free approach to Arabic and Urdu OCR , 2013, Electronic Imaging.

[16]  Jency Thomas,et al.  A Novel Approach for Mixed Noise Removal using 'ROR' Statistics Combined WITH ACWMF and DPVM , 2014 .

[17]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[18]  Alex Graves,et al.  Grid Long Short-Term Memory , 2015, ICLR.

[19]  Mickaël Coustaty,et al.  Visualization of High-Dimensional Data by Pairwise Fusion Matrices Using t-SNE , 2019, Symmetry.

[20]  Muhammad Imran Razzak,et al.  Urdu Nasta’liq text recognition system based on multi-dimensional recurrent neural network and statistical features , 2017, Neural Computing and Applications.

[21]  Imtiaz Ahmed,et al.  Challenges of Urdu Named Entity Recognition: A Scarce Resourced Languageq , 2014 .

[22]  Imran Siddiqi,et al.  Segmentation techniques for recognition of Arabic-like scripts: A comprehensive survey , 2015, Education and Information Technologies.

[23]  Abdel Belaïd,et al.  Multi-font Numerals Recognition for Urdu Script based Languages , 2009 .

[24]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[25]  Bernard Gosselin Multilayer perceptrons combination applied to handwritten character recognition , 2004, Neural Processing Letters.

[26]  Li Yujian,et al.  A Normalized Levenshtein Distance Metric , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Imran Siddiqi,et al.  Offline cursive Urdu-Nastaliq script recognition using multidimensional recurrent neural networks , 2016, Neurocomputing.

[28]  Awais Adnan,et al.  Urdu ligature recognition using multi-level agglomerative hierarchical clustering , 2017, Cluster Computing.

[29]  Muhammad Imran Razzak,et al.  Evaluation of cursive and non-cursive scripts using recurrent neural networks , 2015, Neural Computing and Applications.

[30]  M. E. Maron,et al.  Automatic Indexing: An Experimental Inquiry , 1961, JACM.

[31]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Muazzam Maqsood,et al.  An Efficient Segmentation Technique for Urdu Optical Character Recognizer (OCR) , 2019, Lecture Notes in Networks and Systems.

[33]  Pavlo M. Radiuk,et al.  Impact of Training Set Batch Size on the Performance of Convolutional Neural Networks for Diverse Datasets , 2017, Information Technology and Management Science.

[34]  Hiromichi Fujisawa,et al.  Forty years of research in character and document recognition - an industrial perspective , 2008, Pattern Recognit..

[35]  Muhammad Imran Razzak,et al.  Handwritten Urdu character recognition using one-dimensional BLSTM classifier , 2017, Neural Computing and Applications.

[36]  Tong Zhang,et al.  Effective Use of Word Order for Text Categorization with Convolutional Neural Networks , 2014, NAACL.

[37]  Tariq Rahman,et al.  Language and Politics in Pakistan , 1996 .