论文信息 - Urdu Nastaleeq Optical Character Recognition

Urdu Nastaleeq Optical Character Recognition

This paper discusses the Urdu script characteristics, Urdu Nastaleeq and a simple but a novel and robust technique to recognize the printed Urdu script without a lexicon. Urdu being a family of Arabic script is cursive and complex script in its nature, the main complexity of Urdu compound/connected text is not its connections but the forms/shapes the characters change when it is placed at initial, middle or at the end of a word. The characters recognition technique presented here is using the inherited complexity of Urdu script to solve the problem. A word is scanned and analyzed for the level of its complexity, the point where the level of complexity changes is marked for a character, segmented and feeded to Neural Networks. A prototype of the system has been tested on Urdu text and currently achieves 93.4% accuracy on the average.

[1] Khalid Saeed Computer. New Approaches for Cursive Languages Recognition : Machine and Hand Written Scripts and Texts , 2005 .

[2] Venansius Baryamureeba,et al. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 8 , 2005 .

[3] U. Pal,et al. Recognition of printed Urdu script , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[4] Awais Adnan,et al. OCR For Printed Urdu Script Using Feed Forward Neural Network , 2007 .