Optical Character Recognition System for Urdu Words in Nastaliq Font

Optical Character Recognition (OCR) has been an attractive research area for the last three decades and mature OCR systems reporting near to 100% recognition rates are available for many scripts/languages today. Despite these develop-ments, research on recognition of text in many languages is still in its early days, Urdu being one of them. The limited existing literature on Urdu OCR is either limited to isolated characters or considers limited vocabularies in fixed font sizes. This research presents a segmentation free and size invariant technique for recognition of Urdu words in Nastaliq font using ligatures as units of recognition. Ligatures, separated into primary ligatures and diacritics, are recognized using right-to-left HMMs. Diacritics are then associated with the main body using position information and the resulting ligatures are validated using a dictionary. The system evaluated on Urdu words realized promising recognition rates at ligature and word levels.

[1]  Jay J. Lee,et al.  Data-Driven Design of HMM Topology for Online Handwriting Recognition , 2001, Int. J. Pattern Recognit. Artif. Intell..

[2]  Ching Y. Suen,et al.  Holistic Urdu Handwritten Word Recognition Using Support Vector Machine , 2010, 2010 20th International Conference on Pattern Recognition.

[3]  Mahmood K. Pathan,et al.  Nastaliq optical character recognition , 2008, ACM-SE 46.

[4]  William P. Birmingham,et al.  Modeling Form for On-line Following of Musical Performances , 2005, AAAI.

[5]  Gernot A. Fink,et al.  Markov models for offline handwriting recognition: a survey , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[6]  N.B. Amor,et al.  Multifont Arabic character recognition using Hough transform and hidden Markov models , 2005, ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005..

[7]  Faisal Shafait,et al.  A segmentation-free approach to Arabic and Urdu OCR , 2013, Electronic Imaging.

[8]  Awais Adnan,et al.  OCR For Printed Urdu Script Using Feed Forward Neural Network , 2007 .

[9]  Khalil Khan,et al.  Urdu Character Recognition using Principal Component Analysis , 2012 .

[10]  K. Takahashi,et al.  A Discrete HMM for Online Handwriting Recognition , 2000, Int. J. Pattern Recognit. Artif. Intell..

[11]  Manal A. Abdullah,et al.  Off-Line Arabic Handwriting Character Recognition Using Word Segmentation , 2012, ArXiv.

[12]  Richard M. Schwartz,et al.  Multilingual Machine Printed OCR , 2001, Int. J. Pattern Recognit. Artif. Intell..

[13]  Jamil Samih Al-Azzeh,et al.  An Optical Character Recognition , 2012 .

[14]  Muhammad Atique Ur Rehman,et al.  A New Scale Invariant Optimized Chain Code for Nastaliq Character Representation , 2010, 2010 Second International Conference on Computer Modeling and Simulation.

[15]  Sarmad Hussain,et al.  Segmentation Based Urdu Nastalique OCR , 2013, CIARP.

[16]  Nikos Papamarkos,et al.  An Adaptive Layer-Based Local Binarization Technique for Degraded Documents , 2010, Int. J. Pattern Recognit. Artif. Intell..

[17]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[18]  Surendra Ranganath,et al.  Real-time gesture recognition system and application , 2002, Image Vis. Comput..

[19]  U. Pal,et al.  Recognition of printed Urdu script , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[20]  Nicole Vincent,et al.  Word spotting in historical printed documents using shape and sequence comparisons , 2012, Pattern Recognit..

[21]  Sarmad Hussain,et al.  Urdu Optical Character Recognition System , 2010 .

[22]  Khurram Khurshid,et al.  Analysis and retrieval of historical documents images , 2009 .

[23]  Mohammad Matin,et al.  Urdu character recognition using fourier descriptors for optical networks , 2005, SPIE Optics + Photonics.

[24]  Nicole Vincent,et al.  Text independent writer recognition using redundant writing patterns with contour-based orientation and curvature features , 2010, Pattern Recognit..

[25]  Magdy A. Bayoumi,et al.  Arabic text recognition using neural networks , 1994, Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS '94.

[26]  Sarmad Hussain,et al.  Segmentation Free Nastalique Urdu OCR , 2010 .

[27]  Neil W. Bergmann,et al.  An Arabic optical character recognition system using recognition-based segmentation , 2001, Pattern Recognit..

[28]  Samee Ullah Khan,et al.  The optical character recognition of Urdu-like cursive scripts , 2014, Pattern Recognit..

[29]  Abdul Wahab,et al.  Optical character recognition system for Urdu , 2010, 2010 International Conference on Information and Emerging Technologies.

[30]  M.T. El-Melegy,et al.  Global features for offline recognition of handwritten Arabic literal amounts , 2007, 2007 ITI 5th International Conference on Information and Communications Technology.

[31]  Thierry Paquet,et al.  A writer identification and verification system , 2005, Pattern Recognit. Lett..

[32]  Jochen Triesch,et al.  Classification of hand postures against complex backgrounds using elastic graph matching , 2002, Image Vis. Comput..

[33]  Adnan Amin,et al.  Preprocessing and structural feature extraction for a multi-fonts Arabic/Persian OCR , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[34]  Shamik Sural,et al.  Skew Correction of Document Images by Rank Analysis in Farey sequence , 2013, Int. J. Pattern Recognit. Artif. Intell..

[35]  Muhammad Abuzar Fahiem,et al.  Segmentation of Printed Urdu Scripts Using Structural Features , 2009, 2009 Second International Conference in Visualisation.

[36]  Muhammad Sarfraz,et al.  Offline Arabic text recognition system , 2003, 2003 International Conference on Geometric Modeling and Graphics, 2003. Proceedings.

[37]  Ho-Sub Yoon,et al.  Hand gesture recognition using combined features of location, angle and velocity , 2001, Pattern Recognit..

[38]  Z. A. Shah,et al.  Ligature based optical character recognition of Urdu- Nastaleeq font , 2002 .

[39]  Awais Adnan,et al.  Urdu Nastaleeq Optical Character Recognition , 2007 .

[40]  Sarmad Hussain,et al.  Word Segmentation for Urdu OCR System , 2010 .

[41]  Tracy Hammond,et al.  Urdu Qaeda: Recognition System for Isolated Urdu Characters , 2009 .

[42]  Inam Shamsher,et al.  Urdu compound Character Recognition using feed forward neural networks , 2009, 2009 2nd IEEE International Conference on Computer Science and Information Technology.

[43]  Faiza Iqbal,et al.  Conversion of urdu nastaliq to roman urdu using OCR , 2011, The 4th International Conference on Interaction Sciences.

[44]  Srikanta Patnaik,et al.  Optical Character Recognition System for Urdu (Naskh Font) Using Pattern Matching Technique , 2009 .

[45]  G. Kokkinakis,et al.  Handwritten character segmentation using transformation-based learning , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.