Ligature categorization based Nastaliq Urdu recognition using deep neural networks

The cursive nature, Nastaliq writing style and a large number of different ligatures make ligature recognition very difficult in Urdu. In this paper, we present a segmentation-free approach to holistically recognize Urdu ligatures. We first generate a rich dataset which contains 17,010 ligatures with different orientation and different degrees of noise. Secondly, the ligatures are clustered (categorized) in order to reduce the search space and make the learning robust. Finally, we employ a deep neural network with dropout regularization to classify ligatures. The detailed experiments show that a deep neural network with dropout regularization and clustering of ligatures significantly enhances the classification accuracy.

[1]  S. Hussain,et al.  Rule-based expert system for Urdu Nastaleeq justification , 2004, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[2]  Imran Siddiqi,et al.  Segmentation techniques for recognition of Arabic-like scripts: A comprehensive survey , 2015, Education and Information Technologies.

[3]  Tianwen Zhang,et al.  Off-line recognition of realistic Chinese handwriting using segmentation-free strategy , 2009, Pattern Recognit..

[4]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[5]  Mohammad S. Khorsheed Recognizing Cursive Typewritten Text Using Segmentation-Free System , 2015, TheScientificWorldJournal.

[6]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Ching Y. Suen,et al.  Historical review of OCR research and development , 1992, Proc. IEEE.

[8]  Oge Marques,et al.  Practical Image and Video Processing Using MATLAB®: Marques/Practical Image Processing , 2011 .

[9]  Sarmad Hussain,et al.  Segmentation Free Nastalique Urdu OCR , 2010 .

[10]  Guang Liu,et al.  Ligature based Urdu Nastaleeq sentence recognition using gated bidirectional long short term memory , 2017, Cluster Computing.

[11]  S. Impedovo,et al.  Optical Character Recognition - a Survey , 1991, Int. J. Pattern Recognit. Artif. Intell..

[12]  Sarmad Hussain,et al.  Improving Nastalique specific pre-recognition process for Urdu OCR , 2009, 2009 IEEE 13th International Multitopic Conference.

[13]  Imran Siddiqi,et al.  Urdu Nastaliq recognition using convolutional-recursive deep learning , 2017, Neurocomputing.

[14]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Shehzad Khalid,et al.  Recognition of Urdu ligatures - a holistic approach , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[16]  Re Gonzalez,et al.  R.C. Eddins, Digital image processing using MATLAB, vol. Gatesmark Publishing Knoxville , 2009 .

[17]  Samee Ullah Khan,et al.  The optical character recognition of Urdu-like cursive scripts , 2014, Pattern Recognit..

[18]  Gurpreet Singh Lehal Ligature Segmentation for Urdu OCR , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[19]  Gurpreet Singh Lehal,et al.  Offline Urdu OCR using Ligature based Segmentation for Nastaliq Script , 2015 .

[20]  Gurpreet Singh Lehal,et al.  Recognition of Nastalique Urdu ligatures , 2013, MOCR '13.

[21]  Sarmad Hussain,et al.  Adapting Tesseract for Complex Scripts: An Example for Urdu Nastalique , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[22]  Sarmad Hussain,et al.  Segmentation Based Urdu Nastalique OCR , 2013, CIARP.

[23]  Faisal Shafait,et al.  A segmentation-free approach to Arabic and Urdu OCR , 2013, Electronic Imaging.

[24]  Shehzad Khalid,et al.  Segmentation-free optical character recognition for printed Urdu text , 2017, EURASIP J. Image Video Process..

[25]  Faisal Shafait,et al.  Search Space Reduction for Holistic Ligature Recognition in Urdu Nastalique Script , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[26]  Xiaojie Wang,et al.  Offline Urdu Nastaleeq optical character recognition based on stacked denoising autoencoder , 2017, China Communications.