Curved scene text detection via transverse and longitudinal sequence connection

Abstract Curved text detection is a difficult problem that has not been addressed sufficiently. To highlight the difficulties in reading curved text in a real environment, we constructed a curved text dataset called CTW1500, which includes over 10,000 text annotations in 1500 images, and used it to formulate a polygon-based curved text detector that can detect curved text without using an empirical combination. With the seamless integration of recurrent transverse and longitudinal offset connection, our method explores context information instead of predicting points independently, resulting in smoother and more accurate detection. Our approach is designed as a universal method, meaning it can be trained using rectangular or quadrilateral bounding boxes, requiring no extra effort. Experimental results on the CTW1500 dataset and Total-text demonstrated that our method with only a light backbone can outperform state-of-the-art methods by a large margin. Our method also achieved state-of-the-art performance on the MSRA-TD500 dataset, demonstrating its promising generalization ability. Code, datasets, and label-tool are available at https://github.com/Yuliang-Liu/Curve-Text-Detector .

[1]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jun Zhang,et al.  Multi-Orientation Scene Text Detection with Adaptive Clustering , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Dimosthenis Karatzas,et al.  TextProposals: A text-specific selective search algorithm for word spotting in the wild , 2016, Pattern Recognit..

[4]  Naoyuki Morimoto,et al.  ICDAR2017 Robust Reading Challenge on Omnidirectional Video , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[5]  Bidyut Baran Chaudhuri,et al.  An approach for detecting and cleaning of struck-out handwritten text , 2017, Pattern Recognit..

[6]  Palaiahnakote Shivakumara,et al.  A robust arbitrary text detection system for natural scene images , 2014, Expert Syst. Appl..

[7]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[8]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, CVPR 2004.

[9]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[10]  Andreas Dengel,et al.  ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[11]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[12]  Xiang Bai,et al.  TextBoxes++: A Single-Shot Oriented Scene Text Detector , 2018, IEEE Transactions on Image Processing.

[13]  Simon Osindero,et al.  Recursive Recurrent Nets with Attention Modeling for OCR in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Wafa Khlif,et al.  ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[15]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[16]  Lianwen Jin,et al.  Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiang Bai,et al.  Robust Scene Text Recognition with Automatic Rectification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yuting Gao,et al.  Fused Text Segmentation Networks for Multi-oriented Scene Text Detection , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[19]  Palaiahnakote Shivakumara,et al.  Fractals based multi-oriented text detection system for recognition in mobile video images , 2017, Pattern Recognit..

[20]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[24]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Fei Yin,et al.  Deep Direct Regression for Multi-oriented Scene Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Palaiahnakote Shivakumara,et al.  A blind deconvolution model for scene text detection and recognition in video , 2016, Pattern Recognit..

[27]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[28]  Wenyu Liu,et al.  Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[30]  Shijian Lu,et al.  ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[31]  Luc Van Gool,et al.  Efficient Non-Maximum Suppression , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[32]  Xiang Bai,et al.  Text/non-text image classification in the wild with convolutional neural networks , 2017, Pattern Recognit..

[33]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[34]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Seiichi Uchida,et al.  Could scene context be beneficial for scene text detection? , 2016, Pattern Recognit..

[36]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[37]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[38]  Han Hu,et al.  WordSup: Exploiting Word Annotations for Character Based Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Chucai Yi,et al.  Text String Detection From Natural Scenes by Structure-Based Partition and Grouping , 2011, IEEE Transactions on Image Processing.

[40]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[41]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Kai Wang,et al.  Word Spotting in the Wild , 2010, ECCV.

[43]  David S. Doermann,et al.  Text Detection and Recognition in Imagery: A Survey , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Lei Sun,et al.  A robust approach for text detection from natural scene images , 2015, Pattern Recognit..

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Jianfei Cai,et al.  Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[47]  Chee Seng Chan,et al.  Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[48]  Klaus Meyer-Wegener,et al.  NEOCR: A Configurable Dataset for Natural Image Text Recognition , 2011, CBDAR.

[49]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[50]  Palaiahnakote Shivakumara,et al.  A new multi-modal approach to bib number/text detection and recognition in Marathon images , 2017, Pattern Recognit..