Object Reading: Text Recognition for Object Recognition

We propose to use text recognition to aid in visual object class recognition. To this end we first propose a new algorithm for text detection in natural images. The proposed text detection is based on saliency cues and a context fusion step. The algorithm does not need any parameter tuning and can deal with varying imaging conditions. We evaluate three different tasks: 1. Scene text recognition, where we increase the state-of-the-art by 0.17 on the ICDAR 2003 dataset. 2. Saliency based object recognition, where we outperform other state-of-the-art saliency methods for object recognition on the PASCAL VOC 2011 dataset. 3. Object recognition with the aid of recognized text, where we are the first to report multi-modal results on the IMET set. Results show that text helps for object class recognition if the text is not uniquely coupled to individual object instances.

[1]  Mei-Chen Yeh,et al.  Multimodal fusion using learned text concepts for image categorization , 2006, MM '06.

[2]  Joost van de Weijer,et al.  Boosting color saliency in image feature detection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Nicu Sebe,et al.  Image saliency by isocentric curvedness and color , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[7]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Hovav Shacham,et al.  OpenScan: A Fully Transparent Optical Scan Voting System , 2010, EVT/WOTE.

[9]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Radim Sára,et al.  A Weak Structure Model for Regular Pattern Recognition Applied to Facade Images , 2010, ACCV.

[11]  Jing Zhang,et al.  Text Detection Using Edge Gradient and Graph Spectrum , 2010, 2010 20th International Conference on Pattern Recognition.

[12]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[13]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[14]  Alain Trémeau,et al.  Detecting Text in Natural Scenes Based on a Reduction of Photometric Effects: Problem of Color Invariance , 2011, CCIW.

[15]  Jiri Matas,et al.  Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search , 2011, 2011 International Conference on Document Analysis and Recognition.

[16]  Jan C. van Gemert,et al.  Exploiting photographic style for category-level image classification by generalizing the spatial pyramid , 2011, ICMR.

[17]  Hsueh-Cheng Wang,et al.  The Attraction of Visual Attention to Texts in Real-World Scenes: Are Chinese Texts Attractive to Non-Chinese Speakers? , 2011, CogSci.

[18]  Yaokai Feng,et al.  A Keypoint-Based Approach toward Scenery Character Detection , 2011, 2011 International Conference on Document Analysis and Recognition.

[19]  Andreas Dengel,et al.  How Salient is Scene Text? , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.