WordSup: Exploiting Word Annotations for Character Based Text Detection

Imagery texts are usually organized as a hierarchy of several visual elements, i.e. characters, words, text lines and text blocks. Among these elements, character is the most basic one for various languages such as Western, Chinese, Japanese, mathematical expression and etc. It is natural and convenient to construct a common text detection engine based on character detectors. However, training character detectors requires a vast of location annotated characters, which are expensive to obtain. Actually, the existing real text datasets are mostly annotated in word or line level. To remedy this dilemma, we propose a weakly supervised framework that can utilize word annotations, either in tight quadrangles or the more loose bounding boxes, for character detector training. When applied in scene text detection, we are thus able to train a robust character detector by exploiting word annotations in the rich large-scale real scene text datasets, e.g. ICDAR15 [19] and COCO-text [39]. The character detector acts as a key role in the pipeline of our text detection engine. It achieves the state-of-the-art performance on several challenging scene text detection benchmarks. We also demonstrate the flexibility of our pipeline by various scenarios, including deformed text detection and math expression recognition.

[1]  Weilin Huang,et al.  Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network , 2016, ArXiv.

[2]  Weilin Huang,et al.  Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[3]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[4]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[5]  Huizhong Chen,et al.  Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions , 2011, 2011 18th IEEE International Conference on Image Processing.

[6]  John Beidler,et al.  Data Structures and Algorithms , 1996, Wiley Encyclopedia of Computer Science and Engineering.

[7]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Kai Wang,et al.  Word Spotting in the Wild , 2010, ECCV.

[9]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[10]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[11]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[12]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[13]  Lianwen Jin,et al.  Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Shijian Lu,et al.  Text Flow: A Unified Text Detection System in Natural Scene Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Weilin Huang,et al.  Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Azriel Rosenfeld,et al.  Document structure analysis algorithms: a literature survey , 2003, IS&T/SPIE Electronic Imaging.

[18]  Herbert F. Schantz,et al.  History of OCR, Optical Character Recognition , 1982 .

[19]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[20]  Fuchun Sun,et al.  HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yi Li,et al.  Orientation Robust Text Line Detection in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Siyu Zhu,et al.  A Text Detection System for Natural Scenes with Convolutional Feature Learning and Cascaded Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Han Hu,et al.  Context-aware mathematical expression recognition: An end-to-end framework and a benchmark , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[24]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[26]  Fred L. Bookstein,et al.  Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Andrew Zisserman,et al.  Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[28]  Kaizhu Huang,et al.  Robust Text Detection in Natural Scene Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[30]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xiang Bai,et al.  Symmetry-based text line detection in natural scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yao Li,et al.  Characterness: An Indicator of Text in the Wild , 2013, IEEE Transactions on Image Processing.

[33]  Fei Yin,et al.  Handwritten Chinese text line segmentation by clustering with distance metric learning , 2009, Pattern Recognit..

[34]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Jun Zhang,et al.  Multi-Orientation Scene Text Detection with Adaptive Clustering , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Jiri Matas,et al.  COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images , 2016, ArXiv.

[37]  Yi Yang,et al.  DenseBox: Unifying Landmark Localization with End to End Object Detection , 2015, ArXiv.

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[40]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[41]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Shuchang Zhou,et al.  Scene Text Detection via Holistic, Multi-Channel Prediction , 2016, ArXiv.

[43]  Gaofeng Meng,et al.  Extraction of Virtual Baselines from Distorted Document Images Using Curvilinear Projection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[46]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[47]  Weilin Huang,et al.  Text-Attentional Convolutional Neural Network for Scene Text Detection , 2015, IEEE Transactions on Image Processing.

[48]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[49]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[50]  Wenyu Liu,et al.  Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).