论文信息 - Detecting Arbitrary Oriented Text in the Wild with a Visual Attention Model

Detecting Arbitrary Oriented Text in the Wild with a Visual Attention Model

Text embedded in images provides important semantic information about a scene and its content. Detecting text in an unconstrained environment is a challenging task because of the many fonts, sizes, backgrounds, and alignments of the characters. We present a novel attention model for detecting arbitrary oriented and curved scene text. Inspired by the attention mechanisms in the human visual system, our model utilizes a spatial glimpse network to processes the attended area and deploys a recurrent neural network that aggregates the information over time to determine the attention movement. Combining this with an off-the-shelf region proposal method, the model achieves the state-of-the-art performance on the highly cited ICDAR2013 dataset, and the MSRA-TD500 dataset which contains arbitrary oriented text.

[1] Palaiahnakote Shivakumara,et al. A robust arbitrary text detection system for natural scene images , 2014, Expert Syst. Appl..

[2] H. von Helmholtz,et al. Helmholtz's treatise on physiological optics, Vol. 1, Trans. from the 3rd German ed. , 1924 .

[3] Ronald A. Rensink. The Dynamic Representation of Scenes , 2000 .

[4] Mei-Chen Yeh,et al. Multimodal fusion using learned text concepts for image categorization , 2006, MM '06.

[5] Ernest Valveny,et al. ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[6] Tao Wang,et al. End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[7] References , 1971 .

[8] Shijian Lu,et al. Text Flow: A Unified Text Detection System in Natural Scene Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9] Zhuowen Tu,et al. Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[10] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[11] Ignazio Gallo,et al. Text Localization Based on Fast Feature Pyramids and Multi-Resolution Maximally Stable Extremal Regions , 2014, ACCV Workshops.

[12] Jiri Matas,et al. Efficient Character Skew Rectification in Scene Text Images , 2014, ACCV Workshops.

[13] Andrew Zisserman,et al. Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[14] Xiang Bai,et al. Symmetry-based text line detection in natural scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Palaiahnakote Shivakumara,et al. Detecting text in the real world , 2012, ACM Multimedia.

[16] Jiřı́ Matas,et al. Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17] M. Corbetta,et al. Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.

[18] Jun Zhang,et al. Multi-Orientation Scene Text Detection with Adaptive Clustering , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Yee Whye Teh,et al. Searching for objects driven by context , 2012, NIPS.

[20] Qi Tian,et al. Scale based region growing for scene text detection , 2013, ACM Multimedia.

[21] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[22] Yi Li,et al. Orientation Robust Text Line Detection in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Andrew Zisserman,et al. Deep Features for Text Spotting , 2014, ECCV.

[24] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[25] Kai Wang,et al. End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[26] Fei Yin,et al. Scene Text Localization Using Gradient Local Correlation , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[27] Koray Kavukcuoglu,et al. Visual Attention , 2020, Computational Models for Cognitive Vision.

[28] Yonatan Wexler,et al. Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29] Jiri Matas,et al. A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[30] Jean-Michel Jolion,et al. Object count/area graphs for the evaluation of object detection and segmentation algorithms , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[31] Nando de Freitas,et al. Learning attentional policies for tracking and recognition in video with deep networks , 2011, ICML.

[32] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[33] Chunheng Wang,et al. Scene text detection using graph model built upon maximally stable extremal regions , 2013, Pattern Recognit. Lett..

[34] Alan L. Yuille,et al. Detecting and reading text in natural scenes , 2004, CVPR 2004.