Cluttered TextSpotter: An End-to-End Trainable Light-Weight Scene Text Spotter for Cluttered Environment

Scene text spotting aims at simultaneously localizing and recognizing text instances, symbols, and logos in natural scene images. Scene text detection and recognition approaches have received immense attention in computer vision research community. The presence of partial occlusion or truncation artifact due to the cluttered background of scene images creates an obstacle in perceiving the text instances, which makes the process of spotting very complex. In this paper, we propose a light-weight scene text spotter that can address the issue of cluttered environment of scene images. It is an end-to-end trainable deep neural network that uses local part information, global structural features, and context cue information of oriented region proposals for spotting text instances. It helps to localize in scene images with background clutters, where partially occluded text parts, truncation artifacts, and perspective distortions are present. We mitigate the problem of misclassification caused by inter-class interference by exploring inter-class separability and intra-class compactness. We also incorporate multi-language character segmentation and word-level recognition in a light-weight recognition module. We have used six publicly available benchmark datasets in different smart devices to illustrate the efficacy of the network.

[1]  Kaigui Bian,et al.  Symmetry-Constrained Rectification Network for Scene Text Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Lei Sun,et al.  A Teacher-Student Learning Based Born-Again Training Approach to Improving Scene Text Detection Accuracy , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[3]  Tanima Dutta,et al.  An Efficient System for Hazy Scene Text Detection using a Deep CNN and Patch-NMS , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[4]  Jia Zhu,et al.  RefineText: Refining Multi-oriented Scene Text Detection with a Feature Refinement Module , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[5]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Zhe Chen,et al.  ASTS: A Unified Framework for Arbitrary Shape Text Spotting , 2020, IEEE Transactions on Image Processing.

[8]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[9]  Youbao Tang,et al.  Scene Text Detection Using Superpixel-Based Stroke Feature Transform and Deep Learning Based Region Classification , 2018, IEEE Transactions on Multimedia.

[10]  Jianhuang Lai,et al.  Interleaved Structured Sparse Convolutional Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Wenping Hu,et al.  A Comparative Study of Attention-Based Encoder-Decoder Approaches to Natural Scene Text Recognition , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[12]  Sen Wang,et al.  TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Roberto Manduchi,et al.  Cascaded Segmentation-Detection Networks for Word-Level Text Spotting , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[14]  Lianwen Jin,et al.  Arbitrarily Shaped Scene Text Detection With a Mask Tightness Text Detector , 2019, IEEE Transactions on Image Processing.

[15]  Tanima Dutta,et al.  Robust Scene Text Detection with Deep Feature Pyramid Network and CNN based NMS Model , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[16]  Wafa Khlif,et al.  ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[17]  Hanqing Lu,et al.  CoupleNet: Coupling Global Structure with Local Parts for Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Tao Mei,et al.  A Single-Shot Oriented Scene Text Detector with Learnable Anchors , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[19]  Tanima Dutta,et al.  Text preserving animation generation using smart device , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[20]  Guangming Lu,et al.  Mask-Most Net: Mask Approximation Based Multi-oriented Scene Text Detection Network , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[21]  Wei Zhou,et al.  TextField: Learning a Deep Direction Field for Irregular Scene Text Detection , 2018, IEEE Transactions on Image Processing.

[22]  Palaiahnakote Shivakumara,et al.  Script independent approach for multi-oriented text detection in scene image , 2017, Neurocomputing.

[23]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[24]  Changming Sun,et al.  An End-to-End TextSpotter with Explicit Alignment and Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[26]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Tanima Dutta,et al.  Recurrent Global Convolutional Network for Scene Text Detection , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[28]  Shuicheng Yan,et al.  Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Xiaochun Cao,et al.  Text Co-Detection in Multi-View Scene , 2020, IEEE Transactions on Image Processing.

[30]  Weidong Qiu,et al.  Wacnet: Word Segmentation Guided Characters Aggregation Net for Scene Text Spotting With Arbitrary Shapes , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[31]  Shijian Lu,et al.  ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[33]  Jiasong Wu,et al.  Instance Segmentation Network With Self-Distillation for Scene Text Detection , 2020, IEEE Access.

[34]  Fei Yin,et al.  Deep Direct Regression for Multi-oriented Scene Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Xiaoyong Shen,et al.  Learning Shape-Aware Embedding for Scene Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Junjie Yan,et al.  FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Mohan S. Kankanhalli,et al.  $\mathcal{G}$ -Softmax: Improving Intraclass Compactness and Interclass Separability of Features , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Chao Liu,et al.  GISCA: Gradient-Inductive Segmentation Network With Contextual Attention for Scene Text Detection , 2019, IEEE Access.

[39]  Lianwen Jin,et al.  Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Xiao Yang,et al.  TextContourNet: A Flexible and Effective Framework for Improving Scene Text Detection Architecture With a Multi-Task Cascade , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41]  Gui-Song Xia,et al.  Rotation-Sensitive Regression for Oriented Scene Text Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Zheng-Jun Zha,et al.  MLTS: A Multi-Language Scene Text Spotter , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[44]  Lin Yang,et al.  Text-Guided Neural Network Training for Image Recognition in Natural Scenes and Medicine , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[46]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Tanima Dutta,et al.  Leveraging Smart Devices for Automatic Mood-Transferring in Real-Time Oil Painting , 2017, IEEE Transactions on Industrial Electronics.

[48]  Lianwen Jin,et al.  Tightness-Aware Evaluation Protocol for Scene Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[50]  Qinru Qiu,et al.  C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs , 2018, FPGA.

[51]  Palaiahnakote Shivakumara,et al.  A new Histogram Oriented Moments descriptor for multi-oriented moving text detection in video , 2015, Expert Syst. Appl..

[52]  Lei Sun,et al.  Mask R-CNN With Pyramid Attention Network for Scene Text Detection , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[53]  Hongbin Sun,et al.  Geometry Normalization Networks for Accurate Scene Text Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[54]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[55]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Jiri Matas,et al.  E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text , 2018, ACCV Workshops.

[57]  Palaiahnakote Shivakumara,et al.  A Text-Context-Aware CNN Network for Multi-oriented and Multi-language Scene Text Detection , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[58]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[59]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[61]  Chunhua Shen,et al.  Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[62]  Shuigeng Zhou,et al.  Focusing Attention: Towards Accurate Text Recognition in Natural Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[63]  Shuigeng Zhou,et al.  AON: Towards Arbitrarily-Oriented Text Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Cheng-Lin Liu,et al.  Arbitrary Shape Scene Text Detection With Adaptive Text Region Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Lianwen Jin,et al.  OPMP: An Omnidirectional Pyramid Mask Proposal Network for Arbitrary-Shape Scene Text Detection , 2021, IEEE Transactions on Multimedia.

[66]  Zhi Tang,et al.  Scene Text Recognition via Gated Cascade Attention , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[67]  Wei Liu,et al.  Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition , 2018, AAAI.

[68]  Bo Xu,et al.  NRTR: A No-Recurrence Sequence-to-Sequence Model for Scene Text Recognition , 2018, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[69]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[71]  Dongyoon Han,et al.  Character Region Awareness for Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Weiqiang Wang,et al.  Multi-scale Scene Text Detection via Resolution Transform , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[73]  Chunheng Wang,et al.  TextEdge: Multi-oriented Scene Text Detection via Region Segmentation and Edge Classification , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[74]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[75]  Shuigeng Zhou,et al.  Edit Probability for Scene Text Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[76]  C. Fellbaum An Electronic Lexical Database , 1998 .

[77]  Jiri Matas,et al.  Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[78]  Wei Feng,et al.  TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[79]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[80]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[81]  Xiang Bai,et al.  TextBoxes++: A Single-Shot Oriented Scene Text Detector , 2018, IEEE Transactions on Image Processing.

[82]  Simon Osindero,et al.  Recursive Recurrent Nets with Attention Modeling for OCR in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Tanima Dutta,et al.  Leveraging Smart Devices for Scene Text Preserved Image Stylization: A Deep Gaming Approach , 2020, IEEE MultiMedia.

[84]  Han Hu,et al.  WordSup: Exploiting Word Annotations for Character Based Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[85]  Jiri Matas,et al.  COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images , 2016, ArXiv.

[86]  Wenyi Huang,et al.  Aggregating Local Context for Accurate Scene Text Detection , 2016, ACCV.

[87]  Byron L. D. Bezerra,et al.  OctShuffleMLT: A Compact Octave Based Neural Network for End-to-End Multilingual Text Detection and Recognition , 2019, 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW).

[88]  Lianwen Jin,et al.  A New Parallel Detection-Recognition Approach for End-to-End Scene Text Extraction , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[89]  Xiang Bai,et al.  Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[90]  Palaiahnakote Shivakumara,et al.  Multi-Script-Oriented Text Detection and Recognition in Video/Scene/Born Digital Images , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[91]  Xiang Bai,et al.  Robust Scene Text Recognition with Automatic Rectification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[92]  Shuchang Zhou,et al.  Scene Text Detection via Holistic, Multi-Channel Prediction , 2016, ArXiv.

[93]  Youbao Tang,et al.  Scene Text Detection and Segmentation Based on Cascaded Convolution Neural Networks , 2017, IEEE Transactions on Image Processing.

[94]  Xiaochun Cao,et al.  Deep Multi-Scale Context Aware Feature Aggregation for Curved Scene Text Detection , 2020, IEEE Transactions on Multimedia.