论文信息 - TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting

TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting

Most existing text spotting methods either focus on horizontal/oriented texts or perform arbitrary shaped text spotting with character-level annotations. In this paper, we propose a novel text spotting framework to detect and recognize text of arbitrary shapes in an end-to-end manner, using only word/line-level annotations for training. Motivated from the name of TextSnake, which is only a detection model, we call the proposed text spotting framework TextDragon. In TextDragon, a text detector is designed to describe the shape of text with a series of quadrangles, which can handle text of arbitrary shapes. To extract arbitrary text regions from feature maps, we propose a new differentiable operator named RoISlide, which is the key to connect arbitrary shaped text detection and recognition. Based on the extracted features through RoISlide, a CNN and CTC based text recognizer is introduced to make the framework free from labeling the location of characters. The proposed method achieves state-of-the-art performance on two curved text benchmarks CTW1500 and Total-Text, and competitive results on the ICDAR 2015 Dataset.

[1] Tao Wang,et al. End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[2] Wenyu Liu,et al. TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[3] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] C. V. Jawahar,et al. Top-down and bottom-up cues for scene text recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Jiri Matas,et al. E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text , 2018, ACCV Workshops.

[7] Hartmut Neven,et al. PhotoOCR: Reading Text in Uncontrolled Conditions , 2013, 2013 IEEE International Conference on Computer Vision.

[8] Junjie Yan,et al. FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9] Chee Seng Chan,et al. Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[10] Ernest Valveny,et al. ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[11] Lianwen Jin,et al. Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Shijian Lu,et al. Text Flow: A Unified Text Detection System in Natural Scene Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13] Xiang Bai,et al. Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[15] Chunhua Shen,et al. Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16] Fei Yin,et al. Deep Direct Regression for Multi-oriented Scene Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17] Xiang Bai,et al. Scene text detection and recognition: recent advances and future trends , 2015, Frontiers of Computer Science.

[18] Andrew Zisserman,et al. Deep Features for Text Spotting , 2014, ECCV.

[19] Shuchang Zhou,et al. EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Yi-Chao Wu,et al. Scene Text Recognition with Sliding Convolutional Character Models , 2017, ArXiv.

[21] Wei Liu,et al. STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition , 2016, BMVC.

[22] Xiang Bai,et al. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24] Kai Wang,et al. End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[25] Xiang Bai,et al. Robust Scene Text Recognition with Automatic Rectification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Shuigeng Zhou,et al. Arbitrarily-Oriented Text Recognition , 2017, ArXiv.

[27] Xiang Bai,et al. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29] Dimosthenis Karatzas,et al. TextProposals: A text-specific selective search algorithm for word spotting in the wild , 2016, Pattern Recognit..

[30] Cheng-Lin Liu,et al. Arbitrary Shape Scene Text Detection With Adaptive Text Region Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Shijian Lu,et al. Accurate Scene Text Recognition Based on Recurrent Neural Network , 2014, ACCV.

[32] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[33] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[34] Andrew Zisserman,et al. Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[35] Jiri Matas,et al. Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36] Xin He,et al. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes , 2018, ECCV.

[37] Abhinav Gupta,et al. Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[39] Changming Sun,et al. An End-to-End TextSpotter with Explicit Alignment and Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[41] Pan He,et al. Reading Scene Text in Deep Convolutional Sequences , 2015, AAAI.

[42] Huizhong Chen,et al. Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions , 2011, 2011 18th IEEE International Conference on Image Processing.

[43] Ankush Gupta,et al. Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] David S. Doermann,et al. Text Detection and Recognition in Imagery: A Survey , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45] Andrew Zisserman,et al. Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[46] Lianwen Jin,et al. Detecting Curve Text in the Wild: New Dataset and New Solution , 2017, ArXiv.

[47] Xiang Bai,et al. TextBoxes++: A Single-Shot Oriented Scene Text Detector , 2018, IEEE Transactions on Image Processing.

[48] Weilin Huang,et al. Text-Attentional Convolutional Neural Network for Scene Text Detection , 2015, IEEE Transactions on Image Processing.

[49] Xuelong Li,et al. PixelLink: Detecting Scene Text via Instance Segmentation , 2018, AAAI.