A Hierarchical Visual Saliency Model for Character Detection in Natural Scenes

Visual saliency models have been introduced to the field of character recognition for detecting characters in natural scenes. Researchers believe that characters have different visual properties from their non-character neighbors, which make them salient. With this assumption, characters should response well to computational models of visual saliency. However in some situations, characters belonging to scene text mignt not be as salient as one might expect. For instance, a signboard is usually very salient but the characters on the signboard might not necessarily be so salient globally. In order to analyze this hypothesis in more depth, we first give a view of how much these background regions, such as sign boards, affect the task of saliency-based character detection in natural scenes. Then we propose a hierarchical-saliency method for detecting characters in natural scenes. Experiments on a dataset with over 3,000 images containing scene text show that when using saliency alone for scene text detection, our proposed hierarchical method is able to capture a larger percentage of text pixels as compared to the conventional single-pass algorithm.

[1]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Shiliang Sun,et al.  A Visual Attention Based Approach to Text Extraction , 2010, 2010 20th International Conference on Pattern Recognition.

[3]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[5]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[6]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[7]  Seiichi Uchida,et al.  Text Localization and Recognition in Images and Video , 2014, Handbook of Document Image Processing and Recognition.

[8]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[9]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[10]  Laurent Itti,et al.  A Bayesian model for efficient visual search and recognition , 2010, Vision Research.

[11]  Christof Koch,et al.  AdaBoost for Text Detection in Natural Scene , 2011, 2011 International Conference on Document Analysis and Recognition.

[12]  Andreas Dengel,et al.  Bayesian Approach to Photo Time-Stamp Recognition , 2011, 2011 International Conference on Document Analysis and Recognition.

[13]  Andreas Dengel,et al.  How Salient is Scene Text? , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[14]  Christof Koch,et al.  Attentional Selection for Object Recognition - A Gentle Way , 2002, Biologically Motivated Computer Vision.

[15]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Andrew Y. Ng,et al.  Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning , 2011, 2011 International Conference on Document Analysis and Recognition.

[17]  Eyal Ofek,et al.  Stroke Width Transform , 2010, CVPR 2010.

[18]  C. V. Jawahar,et al.  Top-down and bottom-up cues for scene text recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[21]  Yaokai Feng,et al.  A Keypoint-Based Approach toward Scenery Character Detection , 2011, 2011 International Conference on Document Analysis and Recognition.