Boosting up Scene Text Detectors with Guided CNN

Deep CNNs have achieved great success in text detection. Most of existing methods attempt to improve accuracy with sophisticated network design, while paying less attention on speed. In this paper, we propose a general framework for text detection called Guided CNN to achieve the two goals simultaneously. The proposed model consists of one guidance subnetwork, where a guidance mask is learned from the input image itself, and one primary text detector, where every convolution and non-linear operation are conducted only in the guidance mask. On the one hand, the guidance subnetwork filters out non-text regions coarsely, greatly reduces the computation complexity. On the other hand, the primary text detector focuses on distinguishing between text and hard non-text regions and regressing text bounding boxes, achieves a better detection accuracy. A training strategy, called background-aware block-wise random synthesis, is proposed to further boost up the performance. We demonstrate that the proposed Guided CNN is not only effective but also efficient with two state-of-the-art methods, CTPN and EAST, as backbones. On the challenging benchmark ICDAR 2013, it speeds up CTPN by 2.9 times on average, while improving the F-measure by 1.5%. On ICDAR 2015, it speeds up EAST by 2.0 times while improving the F-measure by 1.0%.

[1]  Fan Jiang,et al.  Deep Scene Text Detection with Connected Component Proposals , 2017, ArXiv.

[2]  Kaizhu Huang,et al.  Robust Text Detection in Natural Scene Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[4]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Weilin Huang,et al.  Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Hao Wang,et al.  Detecting Faces Using Inside Cascaded Contextual CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Jiri Matas,et al.  Real-Time Lexicon-Free Scene Text Localization and Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Yeongjae Cheon,et al.  PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection , 2016, ArXiv.

[10]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[11]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[12]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[13]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Pan He,et al.  Reading Scene Text in Deep Convolutional Sequences , 2015, AAAI.

[15]  Xiaolin Hu,et al.  Joint Training of Cascaded CNN for Face Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[17]  Weilin Huang,et al.  Text-Attentional Convolutional Neural Network for Scene Text Detection , 2015, IEEE Transactions on Image Processing.

[18]  Han Hu,et al.  WordSup: Exploiting Word Annotations for Character Based Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[20]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[21]  Lianwen Jin,et al.  Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jian Sun,et al.  Accelerating Very Deep Convolutional Networks for Classification and Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[26]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[27]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[28]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[29]  Fei Yin,et al.  Deep Direct Regression for Multi-oriented Scene Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, CVPR 2004.

[31]  Wei Li,et al.  R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection , 2017, ArXiv.

[32]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Yuting Gao,et al.  Fused Text Segmentation Networks for Multi-oriented Scene Text Detection , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[34]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[35]  Ignazio Gallo,et al.  Text Localization Based on Fast Feature Pyramids and Multi-Resolution Maximally Stable Extremal Regions , 2014, ACCV Workshops.

[36]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[37]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[38]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[39]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[40]  Qingshan Liu,et al.  Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance , 2014, Int. J. Autom. Comput..

[41]  Xiaoxiao Li,et al.  Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[43]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Junjie Yan,et al.  Mimicking Very Efficient Network for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[46]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Wenyu Liu,et al.  Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Dharmendra S. Modha,et al.  Backpropagation for Energy-Efficient Neuromorphic Computing , 2015, NIPS.

[49]  Chunheng Wang,et al.  Scene Text Recognition Using Part-Based Tree-Structured Character Detection , 2013, CVPR 2013.

[50]  Chunhua Shen,et al.  Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[52]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Matthieu Cord,et al.  Snoopertext: A multiresolution system for text detection in complex visual scenes , 2010, 2010 IEEE International Conference on Image Processing.

[54]  Jiri Matas,et al.  Efficient Scene text localization and recognition with local character refinement , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[55]  Weilin Huang,et al.  Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[56]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[57]  Shuchang Zhou,et al.  Scene Text Detection via Holistic, Multi-Channel Prediction , 2016, ArXiv.

[58]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[59]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[60]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Xiaolin Li,et al.  Single Shot Text Detector with Regional Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).