Text-Edge-Box: An Object Proposal Approach for Scene Texts Localization

Text proposal has been gaining interest in recent years due to the great success of object proposal in categoriesindependent object localization. In this paper, we present a novel text-specific proposal technique that provides superior bounding boxes for accurate text localization in scenes. The proposed technique, which we call Text Edge Box (TEB), uses a binary edge map, a gradient map and an orientation map of an image as inputs. Connected components are first found within the binary edge map, which are scored by two proposed low-cue text features that are extracted in the gradient map and the orientation map, respectively. These scores present text probability of connected components and are aggregated in a text edge image. Scene texts proposals are finally generated by grouping the connected components and estimating their likelihood of being words. The proposed TEB has been evaluated on the two public scene text datasets: the Robust Reading Competition 2013 dataset (ICDAR 2013) dataset and the Street View Text (SVT) dataset. Experiments show that the proposed TEB outperforms the state-of-the-art techniques greatly.

[1]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[2]  Kai Wang,et al.  Word Spotting in the Wild , 2010, ECCV.

[3]  Jie Yang,et al.  Accurate system for automatic pill recognition using imprint information , 2015, IET Image Process..

[4]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[5]  Matthew B. Blaschko,et al.  Learning a category independent object detection cascade , 2011, 2011 International Conference on Computer Vision.

[6]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[7]  C. V. Jawahar,et al.  Enhancing energy minimization framework for scene text recognition with top-down cues , 2016, Comput. Vis. Image Underst..

[8]  Venkatesh Saligrama,et al.  BING++: A Fast High Quality Object Proposal Generator at 100fps , 2015, ArXiv.

[9]  Shijian Lu,et al.  Accurate Scene Text Recognition Based on Recurrent Neural Network , 2014, ACCV.

[10]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[13]  Fuchun Sun,et al.  HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Weilin Huang,et al.  Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[15]  Daijin Kim,et al.  Scene text detection with robust character candidate extraction method , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[16]  Jean-Michel Jolion,et al.  Object count/area graphs for the evaluation of object detection and segmentation algorithms , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[17]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Roman M. Palenichka,et al.  Visual Attention-Guided Approach to Monitoring of Medication Dispensing Using Multi-location Feature Saliency Patterns , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[19]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[20]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[21]  Bernt Schiele,et al.  How good are detection proposals, really? , 2014, BMVC.

[22]  Cristian Sminchisescu,et al.  Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Dimosthenis Karatzas,et al.  TextProposals: A text-specific selective search algorithm for word spotting in the wild , 2016, Pattern Recognit..

[24]  Anil K. Jain,et al.  PILL-ID: Matching and Retrieval of Drug Pill Imprint Images , 2010, 2010 20th International Conference on Pattern Recognition.

[25]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[26]  Albert Gordo,et al.  Supervised mid-level features for word image representation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Roberto Manduchi,et al.  A fast and robust text spotter , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[28]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  James M. Rehg,et al.  RIGOR: Reusing Inference in Graph Cuts for Generating Object Regions , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Santiago Manen,et al.  Prime Object Proposals with Randomized Prim's Algorithm , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Lianwen Jin,et al.  DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images , 2016, ArXiv.

[32]  Tao Chen,et al.  Scene text extraction based on edges and support vector regression , 2015, International Journal on Document Analysis and Recognition (IJDAR).

[33]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Xiang Bai,et al.  Automatic discrimination of text and non-text natural images , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[35]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Xiang Bai,et al.  Symmetry-based text line detection in natural scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[39]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[40]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .