论文信息 - Improved localization accuracy by LocNet for Faster R-CNN based text detection in natural scene images

Improved localization accuracy by LocNet for Faster R-CNN based text detection in natural scene images

Abstract Although Faster R-CNN based text detection approaches have achieved promising results, their localization accuracy is not satisfactory in certain cases due to their sub-optimal bounding box regression based localization modules. In this paper, we address this problem and propose replacing the bounding box regression module with a novel LocNet based localization module to improve the localization accuracy of a Faster R-CNN based text detector. Given a proposal generated by a region proposal network (RPN), instead of directly predicting the bounding box coordinates of the concerned text instance, the proposal is enlarged to create a search region so that an “In-Out” conditional probability to each row and column of this search region is assigned, which can then be used to accurately infer the concerned bounding box. Furthermore, we present a simple yet effective two-stage approach to convert the difficult multi-oriented text detection problem to a relatively easier horizontal text detection problem, which makes our approach able to robustly detect multi-oriented text instances with accurate bounding box localization. Experiments demonstrate that the proposed approach boosts the localization accuracy of Faster R-CNN based text detectors significantly. Consequently, our new text detector has achieved superior performance on both horizontal (ICDAR-2011, ICDAR-2013 and MULTILIGUL) and multi-oriented (MSRA-TD500, ICDAR-2015) text detection benchmark tasks.

Qiang Huo | Zhuoyao Zhong | Lei Sun

[1] Jean-Michel Jolion,et al. Object count/area graphs for the evaluation of object detection and segmentation algorithms , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[2] Lei Sun,et al. Improved Localization Accuracy by LocNet for Faster R-CNN Based Text Detection , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[3] Seiichi Uchida,et al. Could scene context be beneficial for scene text detection? , 2016, Pattern Recognit..

[4] Yonatan Wexler,et al. Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5] Jon Almazán,et al. ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[6] Trevor Darrell,et al. Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Jun Zhang,et al. Multi-Orientation Scene Text Detection with Adaptive Clustering , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Kavita Bala,et al. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Lei Sun,et al. A robust approach for text detection from natural scene images , 2015, Pattern Recognit..

[11] Cheng-Lin Liu,et al. A Hybrid Approach to Detect and Localize Texts in Natural Scene Images , 2011, IEEE Transactions on Image Processing.

[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Jorge Stolfi,et al. T-HOG: An effective gradient-based descriptor for single line text regions , 2013, Pattern Recognit..

[14] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15] Andrew Zisserman,et al. Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[16] Yi Li,et al. Orientation Robust Text Line Detection in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Yuning Jiang,et al. UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[18] Yonghong Song,et al. A coarse-to-fine scene text detection method based on Skeleton-cut detector and Binary-Tree-Search based rectification , 2018, Pattern Recognit. Lett..

[19] Dimosthenis Karatzas,et al. A fast hierarchical method for multi-script and arbitrary oriented scene text extraction , 2014, International Journal on Document Analysis and Recognition (IJDAR).

[20] Wenyu Liu,et al. TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[21] Chee Seng Chan,et al. Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[22] Pan He,et al. Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[23] Nikos Komodakis,et al. Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24] Dimosthenis Karatzas,et al. TextProposals: A text-specific selective search algorithm for word spotting in the wild , 2016, Pattern Recognit..

[25] Abhinav Gupta,et al. Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[28] Wenyu Liu,et al. A Unified Framework for Multioriented Text Detection and Recognition , 2014, IEEE Transactions on Image Processing.

[29] Lianwen Jin,et al. Curved scene text detection via transverse and longitudinal sequence connection , 2019, Pattern Recognit..

[30] Wenyu Liu,et al. Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Nikos Komodakis,et al. LocNet: Improving Localization Accuracy for Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Han Hu,et al. WordSup: Exploiting Word Annotations for Character Based Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Shijian Lu,et al. A pooling based scene text proposal technique for scene text reading in the wild , 2018, Pattern Recognit..

[35] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[36] Kaizhu Huang,et al. Robust Text Detection in Natural Scene Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] Andrew Zisserman,et al. Deep Features for Text Spotting , 2014, ECCV.

[38] Shuchang Zhou,et al. EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Andreas Dengel,et al. ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[40] Xiangyang Xue,et al. Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[41] Zhuowen Tu,et al. Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[42] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[44] Tao Wang,et al. End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[45] Andrew D. Bagdanov,et al. FAST: Facilitated and Accurate Scene Text Proposals through FCN Guided Pruning , 2017, Pattern Recognit. Lett..

[46] Lianwen Jin,et al. Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48] Shijian Lu,et al. Text Flow: A Unified Text Detection System in Natural Scene Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49] Xiang Bai,et al. Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[51] Kai Wang,et al. End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[52] Jiri Matas,et al. Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[53] Jiri Matas,et al. A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[54] Anil K. Jain,et al. Text information extraction in images and video: a survey , 2004, Pattern Recognit..

[55] Jiřı́ Matas,et al. Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[56] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Fei Yin,et al. Deep Direct Regression for Multi-oriented Scene Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).