Convolutional Character Networks

Recent progress has been made on developing a unified framework for joint text detection and recognition in natural images, but existing joint models were mostly built on two-stage framework by involving ROI pooling, which can degrade the performance on recognition task. In this work, we propose convolutional character networks, referred as CharNet, which is an one-stage model that can process two tasks simultaneously in one pass. CharNet directly outputs bounding boxes of words and characters, with corresponding character labels. We utilize character as basic element, allowing us to overcome the main difficulty of existing approaches that attempted to optimize text detection jointly with a RNN-based recognition branch. In addition, we develop an iterative character detection approach able to transform the ability of character detection learned from synthetic data to real-world images. These technical improvements result in a simple, compact, yet powerful one-stage model that works reliably on multi-orientation and curved text. We evaluate CharNet on three standard benchmarks, where it consistently outperforms the state-of-the-art approaches [25, 24] by a large margin, e.g., with improvements of 65.33%->71.08% (with generic lexicon) on ICDAR 2015, and 54.0%->69.23% on Total-Text, on end-to-end text recognition. Code is available at: https://github.com/MalongTech/research-charnet.

[1]  Yuning Jiang,et al.  UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[2]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[4]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[5]  Shuigeng Zhou,et al.  Edit Probability for Scene Text Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Wei Li,et al.  R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection , 2017, ArXiv.

[7]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[8]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[9]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[10]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[11]  Xiang Bai,et al.  ASTER: An Attentional Scene Text Recognizer with Flexible Rectification , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Weilin Huang,et al.  Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Yeongjae Cheon,et al.  PVANet: Lightweight Deep Neural Networks for Real-time Object Detection , 2016, ArXiv.

[14]  Wafa Khlif,et al.  ICDAR2017 Robust Reading Challenge on Multi-Lingual Scene Text Detection and Script Identification - RRC-MLT , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jiri Matas,et al.  Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Errui Ding,et al.  TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network , 2018, ACCV.

[19]  Xiaolin Li,et al.  Single Shot Text Detector with Regional Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[21]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Han Hu,et al.  WordSup: Exploiting Word Annotations for Character Based Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Junjie Yan,et al.  FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Weilin Huang,et al.  Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[26]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xiang Bai,et al.  Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Chee Seng Chan,et al.  Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[29]  Xiang Bai,et al.  Symmetry-based text line detection in natural scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[33]  Chunhua Shen,et al.  Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Shuigeng Zhou,et al.  Focusing Attention: Towards Accurate Text Recognition in Natural Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Wei Zhou,et al.  TextField: Learning a Deep Direction Field for Irregular Scene Text Detection , 2018, IEEE Transactions on Image Processing.

[36]  Changming Sun,et al.  An End-to-End TextSpotter with Explicit Alignment and Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Pan He,et al.  Reading Scene Text in Deep Convolutional Sequences , 2015, AAAI.

[39]  Wei Liu,et al.  Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition , 2018, AAAI.

[40]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.