TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks. However, limited by the representations (axis-aligned rectangles, rotated rectangles or quadrangles) adopted to describe text, existing methods may fall short when dealing with much more free-form text instances, such as curved text, which are actually very common in real-world scenarios. To tackle this problem, we propose a more flexible representation for scene text, termed as TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms. In TextSnake, a text instance is described as a sequence of ordered, overlapping disks centered at symmetric axes, each of which is associated with potentially variable radius and orientation. Such geometry attributes are estimated via a Fully Convolutional Network (FCN) model. In experiments, the text detector based on TextSnake achieves state-of-the-art or comparable performance on Total-Text and SCUT-CTW1500, the two newly published benchmarks with special emphasis on curved text in natural images, as well as the widely-used datasets ICDAR 2015 and MSRA-TD500. Specifically, TextSnake outperforms the baseline on Total-Text by more than 40% in F-measure.

[1]  Wenyu Liu,et al.  Strokelets: A Learned Multi-scale Representation for Scene Text Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Xuelong Li,et al.  PixelLink: Detecting Scene Text via Instance Segmentation , 2018, AAAI.

[3]  Jean-Michel Jolion,et al.  Object count/area graphs for the evaluation of object detection and segmentation algorithms , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[4]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yi Li,et al.  Orientation Robust Text Line Detection in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Xiaolin Li,et al.  Single Shot Text Detector with Regional Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[11]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[13]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[16]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Lianwen Jin,et al.  Feature Enhancement Network: A Refined Scene Text Detector , 2017, AAAI.

[18]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[19]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  David S. Doermann,et al.  Text Detection and Recognition in Imagery: A Survey , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Fei Yin,et al.  Deep Direct Regression for Multi-oriented Scene Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Zihan Zhou,et al.  Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xiang Bai,et al.  ASTER: An Attentional Scene Text Recognizer with Flexible Rectification , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[26]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Xiang Bai,et al.  Symmetry-based text line detection in natural scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Shijian Lu,et al.  WeText: Scene Text Detection under Weak Supervision , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[31]  Lianwen Jin,et al.  Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Xiang Bai,et al.  Scene text detection and recognition: recent advances and future trends , 2015, Frontiers of Computer Science.

[35]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[36]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[37]  Shuchang Zhou,et al.  Scene Text Detection via Holistic, Multi-Channel Prediction , 2016, ArXiv.

[38]  Yue Wu,et al.  Self-Organized Text Detection with Minimal Post-processing via Border Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Wenyu Liu,et al.  Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Chee Seng Chan,et al.  Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[41]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Han Hu,et al.  WordSup: Exploiting Word Annotations for Character Based Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Yi Yang,et al.  DenseBox: Unifying Landmark Localization with End to End Object Detection , 2015, ArXiv.

[44]  Weilin Huang,et al.  Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[45]  Shuicheng Yan,et al.  Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[47]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[48]  Lianwen Jin,et al.  Detecting Curve Text in the Wild: New Dataset and New Solution , 2017, ArXiv.

[49]  Kaizhu Huang,et al.  Robust Text Detection in Natural Scene Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[51]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Wenyu Liu,et al.  A Unified Framework for Multioriented Text Detection and Recognition , 2014, IEEE Transactions on Image Processing.