Urdu-Text Detection and Recognition in Natural Scene Images Using Deep Learning

Urdu text is a cursive script and belongs to a non-Latin family of other cursive scripts like Arabic, Chinese, and Hindi. Urdu text poses a challenge for detection/localization from natural scene images, and consequently recognition of individual ligatures in scene images. In this paper, a methodology is proposed that covers detection, orientation prediction, and recognition of Urdu ligatures in outdoor images. As a first step, the custom FasterRCNN algorithm has been used in conjunction with well-known CNNs like Squeezenet, Googlenet, Resnet18, and Resnet50 for detection and localization purposes for images of size $320\times 240$ pixels. For ligature Orientation prediction, a custom Regression Residual Neural Network (RRNN) is trained/tested on datasets containing randomly oriented ligatures. Recognition of ligatures was done using Two Stream Deep Neural Network (TSDNN). In our experiments, five-set of datasets, containing 4.2K and 51K Urdu-text-embedded synthetic images were generated using the CLE annotation text to evaluate different tasks of detection, orientation prediction, and recognition of ligatures. These synthetic images contain 132, and 1600 unique ligatures corresponding to 4.2K and 51K images respectively, with 32 variations of each ligature (4-backgrounds and font 8-color variations). Also, 1094 real-world images containing more than 12k Urdu characters were used for TSDNN’s evaluation. Finally, all four detectors were evaluated and used to compare them for their ability to detect/localize Urdu-text using average-precision (AP). Resnet50 features based FasterRCNN was found to be the winner detector with AP of.98. While Squeeznet, Googlenet, Resnet18 based detectors had testing AP of.65,.88, and.87 respectively. RRNN achieved and accuracy of 79% and 99% for 4k and 51K images respectively. Similarly, for characters classification in ligatures, TSDNN attained a partial sequence recognition rate of 94.90% and 95.20% for 4k and 51K images respectively. Similarly, a partial sequence recognition rate of 76.60% attained for real world-images.

[1]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Imran Siddiqi,et al.  Multilingual artificial text detection and extraction from still images , 2013, Electronic Imaging.

[3]  Rubiyah Yusof,et al.  Evaluation of Handwritten Urdu Text by Integration of MNIST Dataset Learning Experience , 2019, IEEE Access.

[4]  Ankita Srivastava,et al.  A Novel Segmentation Technique for Urdu Type-Written Text , 2018, 2018 Recent Advances on Engineering, Technology and Computational Sciences (RAETCS).

[5]  David S. Doermann,et al.  Text Detection and Recognition in Imagery: A Survey , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Jun Sun,et al.  A novel text structure feature extractor for Chinese scene text detection and recognition , 2017, 2016 23rd International Conference on Pattern Recognition (ICPR).

[8]  Syed Yasser Arafat,et al.  Two Stream Deep Neural Network for Sequence-Based Urdu Ligature Recognition , 2019, IEEE Access.

[9]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[10]  Faisal Shafait,et al.  A segmentation-free approach to Arabic and Urdu OCR , 2013, Electronic Imaging.

[11]  Ajith Abraham,et al.  Arabic text detection using ensemble machine learning , 2018, Int. J. Hybrid Intell. Syst..

[12]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[13]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[14]  Imran Siddiqi,et al.  Impact of Pre-Processing on Recognition of Cursive Video Text , 2019, IbPRIA.

[15]  Muhammad Imran Razzak,et al.  Arabic Cursive Text Recognition from Natural Scene Images , 2019, Applied Sciences.

[16]  C. V. Jawahar,et al.  Unconstrained OCR for Urdu Using Deep CNN-RNN Hybrid Networks , 2017, 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR).

[17]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[18]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[19]  Ganesh R. Naik,et al.  Bilingual text detection in natural scene images using invariant moments , 2019, J. Intell. Fuzzy Syst..

[20]  Chee Seng Chan,et al.  Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[21]  Imran Siddiqi,et al.  Classification of Urdu Ligatures Using Convolutional Neural Networks - A Novel Approach , 2017, 2017 International Conference on Frontiers of Information Technology (FIT).

[22]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[23]  Junjie Yan,et al.  FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Imran Siddiqi,et al.  Edge-Based Features for Localization of Artificial Urdu Text in Video Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[25]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[26]  Imran Siddiqi,et al.  Multilingual Artificial Text Extraction and Script Identification from Video Images , 2016 .

[27]  Faisal Shafait,et al.  Impact of Ligature Coverage on Training Practical Urdu OCR Systems , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[28]  Shehzad Khalid,et al.  Line and Ligature Segmentation in Printed Urdu Document Images , 2016 .

[29]  Imran Siddiqi,et al.  Ligature Recognition in Urdu Caption Text using Deep Convolutional Neural Networks , 2018, 2018 14th International Conference on Emerging Technologies (ICET).

[30]  Asghar Ali Chandio,et al.  Convolutional Feature Fusion for Multi-Language Text Detection in Natural Scene Images , 2019, 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET).

[31]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[32]  Asghar Ali,et al.  Urdu Natural Scene Character Recognition using Convolutional Neural Networks , 2018, 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR).

[33]  Xiaojie Wang,et al.  Line and Ligature Segmentation of Urdu Nastaleeq Text , 2017, IEEE Access.

[34]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[35]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[37]  M. I. Razzak,et al.  A Novel Dataset for English-Arabic Scene Text Recognition (EASTR)-42K and Its Evaluation Using Invariant Feature Extraction on Detected Extremal Regions , 2019, IEEE Access.

[38]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Lianwen Jin,et al.  Detecting Curve Text in the Wild: New Dataset and New Solution , 2017, ArXiv.

[40]  Anand Singh Jalal,et al.  A robust model for salient text detection in natural scene images using MSER feature detector and Grabcut , 2018, Multimedia Tools and Applications.

[41]  Gurpreet Singh Lehal,et al.  Offline Urdu OCR using Ligature based Segmentation for Nastaliq Script , 2015 .

[42]  Pawanesh Abrol,et al.  Automatic text extraction and character segmentation using maximally stable extremal regions , 2016, ArXiv.

[43]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Chunheng Wang,et al.  Scene text detection using graph model built upon maximally stable extremal regions , 2013, Pattern Recognit. Lett..

[46]  Imran Siddiqi,et al.  Urdu Nastaliq recognition using convolutional-recursive deep learning , 2017, Neurocomputing.

[47]  Akhtar Hussain Jalbani,et al.  Artificial Urdu Text Detection and Localization from Individual Video Frames , 2018 .

[48]  Asghar Ali Chandio,et al.  Character classification and recognition for Urdu texts in natural scene images , 2018, 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET).

[49]  Shehzad Khalid,et al.  A Holistic Approach for Recognition of Complete Urdu Ligatures Using Hidden Markov Models , 2017, 2017 International Conference on Frontiers of Information Technology (FIT).

[50]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Imran Siddiqi,et al.  Urdu Caption Text Detection using Textural Features , 2018, MedPRAI '18.

[52]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Zia ur Rehman,et al.  Ligature categorization based Nastaliq Urdu recognition using deep neural networks , 2019, Comput. Math. Organ. Theory.

[54]  Imran Siddiqi,et al.  Multilingual Artificial Text Detection Using a Cascade of Transforms , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[55]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[56]  Imran Siddiqi,et al.  Optical Character Recognition System for Urdu Words in Nastaliq Font , 2016 .

[57]  Jiri Matas,et al.  COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images , 2016, ArXiv.

[58]  Bolei Zhou,et al.  Places: An Image Database for Deep Scene Understanding , 2016, ArXiv.

[59]  Awais Adnan,et al.  Urdu ligature recognition using multi-level agglomerative hierarchical clustering , 2017, Cluster Computing.

[60]  Muhammad Imran Razzak,et al.  Deep learning based isolated Arabic scene character recognition , 2017, 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR).