Recognition of Urdu ligatures - a holistic approach

This paper presents an effective segmentation-free and scale-invariant technique for recognition of Urdu ligatures in Nastaliq font. The proposed technique relies on separating the main body of ligatures from the secondary components and training a separate hidden Markov model for each. Features capturing projection, concavity and curvature information of ligatures are extracted using right-to-left sliding windows and are fed to the models for training. The system trained and evaluated on a total of more than 2,000 frequently occurring Urdu ligatures from a standard database realized a recognition rate of 97.93%.

[1]  Mahmood K. Pathan,et al.  Nastaliq optical character recognition , 2008, ACM-SE 46.

[2]  Faisal Shafait,et al.  A segmentation-free approach to Arabic and Urdu OCR , 2013, Electronic Imaging.

[3]  Muhammad Abuzar Fahiem,et al.  Segmentation of Printed Urdu Scripts Using Structural Features , 2009, 2009 Second International Conference in Visualisation.

[4]  Nicole Vincent,et al.  A Set of Chain Code Based Features for Writer Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[5]  Chafic Mokbel,et al.  Combining Slanted-Frame Classifiers for Improved HMM-Based Arabic Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Samee Ullah Khan,et al.  The optical character recognition of Urdu-like cursive scripts , 2014, Pattern Recognit..

[7]  Gurpreet Singh Lehal Ligature Segmentation for Urdu OCR , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[8]  Nicole Vincent,et al.  Word spotting in historical printed documents using shape and sequence comparisons , 2012, Pattern Recognit..

[9]  Abdul Wahab,et al.  Optical character recognition system for Urdu , 2010, 2010 International Conference on Information and Emerging Technologies.

[10]  Sarmad Hussain,et al.  Word Segmentation for Urdu OCR System , 2010 .

[11]  Sarmad Hussain,et al.  Segmentation Free Nastalique Urdu OCR , 2010 .

[12]  Gurpreet Singh Lehal Choice of recognizable units for URDU OCR , 2012, DAR '12.

[13]  Awais Adnan,et al.  Urdu Nastaleeq Optical Character Recognition , 2007 .

[14]  Nicole Vincent,et al.  Text independent writer recognition using redundant writing patterns with contour-based orientation and curvature features , 2010, Pattern Recognit..

[15]  Sarmad Hussain,et al.  Improving Nastalique specific pre-recognition process for Urdu OCR , 2009, 2009 IEEE 13th International Multitopic Conference.

[16]  Inam Shamsher,et al.  Urdu compound Character Recognition using feed forward neural networks , 2009, 2009 2nd IEEE International Conference on Computer Science and Information Technology.

[17]  Ching Y. Suen,et al.  Holistic Urdu Handwritten Word Recognition Using Support Vector Machine , 2010, 2010 20th International Conference on Pattern Recognition.

[18]  Gernot A. Fink,et al.  Markov models for offline handwriting recognition: a survey , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[19]  Sarmad Hussain,et al.  Context Sensitive Shape-Substitution in Nastaliq Writing System: Analysis and Formulation , 2007 .

[20]  Laurence Likforman-Sulem,et al.  Combination of HMM-Based Classifiers for the Recognition of Arabic Handwritten Words , 2007 .

[21]  U. Pal,et al.  Recognition of printed Urdu script , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[22]  Sarmad Hussain,et al.  Adapting Tesseract for Complex Scripts: An Example for Urdu Nastalique , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[23]  Hany Ahmed,et al.  Effective technique for the recognition of offline Arabic handwritten words using hidden Markov models , 2013, International Journal on Document Analysis and Recognition (IJDAR).