Improved localization accuracy by LocNet for Faster R-CNN based text detection in natural scene images

Abstract Although Faster R-CNN based text detection approaches have achieved promising results, their localization accuracy is not satisfactory in certain cases due to their sub-optimal bounding box regression based localization modules. In this paper, we address this problem and propose replacing the bounding box regression module with a novel LocNet based localization module to improve the localization accuracy of a Faster R-CNN based text detector. Given a proposal generated by a region proposal network (RPN), instead of directly predicting the bounding box coordinates of the concerned text instance, the proposal is enlarged to create a search region so that an “In-Out” conditional probability to each row and column of this search region is assigned, which can then be used to accurately infer the concerned bounding box. Furthermore, we present a simple yet effective two-stage approach to convert the difficult multi-oriented text detection problem to a relatively easier horizontal text detection problem, which makes our approach able to robustly detect multi-oriented text instances with accurate bounding box localization. Experiments demonstrate that the proposed approach boosts the localization accuracy of Faster R-CNN based text detectors significantly. Consequently, our new text detector has achieved superior performance on both horizontal (ICDAR-2011, ICDAR-2013 and MULTILIGUL) and multi-oriented (MSRA-TD500, ICDAR-2015) text detection benchmark tasks.

[1]  Jean-Michel Jolion,et al.  Object count/area graphs for the evaluation of object detection and segmentation algorithms , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[2]  Lei Sun,et al.  Improved Localization Accuracy by LocNet for Faster R-CNN Based Text Detection , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[3]  Seiichi Uchida,et al.  Could scene context be beneficial for scene text detection? , 2016, Pattern Recognit..

[4]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[6]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jun Zhang,et al.  Multi-Orientation Scene Text Detection with Adaptive Clustering , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Lei Sun,et al.  A robust approach for text detection from natural scene images , 2015, Pattern Recognit..

[11]  Cheng-Lin Liu,et al.  A Hybrid Approach to Detect and Localize Texts in Natural Scene Images , 2011, IEEE Transactions on Image Processing.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jorge Stolfi,et al.  T-HOG: An effective gradient-based descriptor for single line text regions , 2013, Pattern Recognit..

[14]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[16]  Yi Li,et al.  Orientation Robust Text Line Detection in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Yuning Jiang,et al.  UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[18]  Yonghong Song,et al.  A coarse-to-fine scene text detection method based on Skeleton-cut detector and Binary-Tree-Search based rectification , 2018, Pattern Recognit. Lett..

[19]  Dimosthenis Karatzas,et al.  A fast hierarchical method for multi-script and arbitrary oriented scene text extraction , 2014, International Journal on Document Analysis and Recognition (IJDAR).

[20]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[21]  Chee Seng Chan,et al.  Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[22]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[23]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Dimosthenis Karatzas,et al.  TextProposals: A text-specific selective search algorithm for word spotting in the wild , 2016, Pattern Recognit..

[25]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[28]  Wenyu Liu,et al.  A Unified Framework for Multioriented Text Detection and Recognition , 2014, IEEE Transactions on Image Processing.

[29]  Lianwen Jin,et al.  Curved scene text detection via transverse and longitudinal sequence connection , 2019, Pattern Recognit..

[30]  Wenyu Liu,et al.  Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Nikos Komodakis,et al.  LocNet: Improving Localization Accuracy for Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Han Hu,et al.  WordSup: Exploiting Word Annotations for Character Based Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Shijian Lu,et al.  A pooling based scene text proposal technique for scene text reading in the wild , 2018, Pattern Recognit..

[35]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[36]  Kaizhu Huang,et al.  Robust Text Detection in Natural Scene Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[38]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Andreas Dengel,et al.  ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[40]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[41]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[42]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[44]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[45]  Andrew D. Bagdanov,et al.  FAST: Facilitated and Accurate Scene Text Proposals through FCN Guided Pruning , 2017, Pattern Recognit. Lett..

[46]  Lianwen Jin,et al.  Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Shijian Lu,et al.  Text Flow: A Unified Text Detection System in Natural Scene Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[51]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[52]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[53]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[54]  Anil K. Jain,et al.  Text information extraction in images and video: a survey , 2004, Pattern Recognit..

[55]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Fei Yin,et al.  Deep Direct Regression for Multi-oriented Scene Text Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).