论文信息 - Deep Features for Text Spotting

Deep Features for Text Spotting

The goal of this work is text spotting in natural images. This is divided into two sequential tasks: detecting words regions in the image, and recognizing the words within these regions. We make the following contributions: first, we develop a Convolutional Neural Network (CNN) classifier that can be used for both tasks. The CNN has a novel architecture that enables efficient feature sharing (by using a number of layers in common) for text detection, character case-sensitive and insensitive classification, and bigram classification. It exceeds the state-of-the-art performance for all of these. Second, we make a number of technical changes over the traditional CNN architectures, including no downsampling for a per-pixel sliding window, and multi-mode learning with a mixture of linear models (maxout). Third, we have a method of automated data mining of Flickr, that generates word and character level annotations. Finally, these components are used together to form an end-to-end, state-of-the-art text spotting system. We evaluate the text-spotting system on two standard benchmarks, the ICDAR Robust Reading data set and the Street View Text data set, and demonstrate improvements over the state-of-the-art on multiple measures.

[1] N. Otsu. A threshold selection method from gray level histograms , 1979 .

[2] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3] Marie-Pierre Jolly,et al. Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[4] Marie-Pierre Jolly,et al. Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.

[5] Jiri Matas,et al. Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[6] Alan L. Yuille,et al. Detecting and reading text in natural scenes , 2004, CVPR 2004.

[7] A. Torralba,et al. Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[9] S.M. Lucas,et al. ICDAR 2005 text locating competition results , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[10] R. Manmatha,et al. Word spotting for historical documents , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[11] Langis Gagnon,et al. Key-text spotting in documentary videos using Adaboost , 2006, Electronic Imaging.

[12] Vincent Lepetit,et al. Fast Keypoint Recognition in Ten Lines of Code , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Till Quack,et al. Large scale mining and retrieval of visual data in a multimodal context , 2008 .

[14] Pascal Vincent,et al. Visualizing Higher-Layer Features of a Deep Network , 2009 .

[15] Manik Varma,et al. Character Recognition in Natural Images , 2009, VISAPP.

[16] Kai Wang,et al. Word Spotting in the Wild , 2010, ECCV.

[17] Peter I. Corke,et al. Using text-spotting to query the world , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18] Yonatan Wexler,et al. Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19] Radim Sára,et al. A Weak Structure Model for Regular Pattern Recognition Applied to Facade Images , 2010, ACCV.

[20] Thomas Deselaers,et al. ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[21] Jiri Matas,et al. A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[22] Andrew Y. Ng,et al. Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning , 2011, 2011 International Conference on Document Analysis and Recognition.

[23] Kai Wang,et al. End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[24] Andreas Dengel,et al. ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[25] Jiri Matas,et al. Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search , 2011, 2011 International Conference on Document Analysis and Recognition.

[26] Huizhong Chen,et al. Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions , 2011, 2011 18th IEEE International Conference on Image Processing.

[27] Ioannis Pratikakis,et al. Detection of artificial and scene text in images and video frames , 2013, Pattern Analysis and Applications.

[28] Chucai Yi,et al. Text String Detection From Natural Scenes by Structure-Based Partition and Grouping , 2011, IEEE Transactions on Image Processing.

[29] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[30] Tao Wang,et al. End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[31] Matthieu Guillaumin,et al. Segmentation Propagation in ImageNet , 2012, ECCV.

[32] Harald Sack,et al. A framework for improved video text detection and recognition , 2014, Multimedia Tools and Applications.

[33] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34] C. V. Jawahar,et al. Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[35] Jiřı́ Matas,et al. Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36] C. V. Jawahar,et al. Robust Recognition of Degraded Documents Using Character N-Grams , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[37] Tatiana Novikova,et al. Large-Lexicon Attribute-Consistent Text Recognition in Natural Images , 2012, ECCV.

[38] Hartmut Neven,et al. PhotoOCR: Reading Text in Uncontrolled Conditions , 2013, 2013 IEEE International Conference on Computer Vision.

[39] C. V. Jawahar,et al. Whole is Greater than Sum of Parts: Recognizing Scene Text Words , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[40] Jiri Matas,et al. Scene Text Localization and Recognition with Oriented Stroke Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[41] Kaizhu Huang,et al. Accurate and robust text detection: a step-in for text retrieval in natural scene images , 2013, SIGIR.

[42] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.

[43] Jon Almazán,et al. ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[44] Andrew Zisserman,et al. Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[45] Jerod J. Weinman,et al. Toward Integrated Scene Text Reading , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46] Kaizhu Huang,et al. Robust Text Detection in Natural Scene Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47] Yann LeCun,et al. Fast Training of Convolutional Networks through FFTs , 2013, ICLR.

[48] Joelle Pineau,et al. End-to-End Text Recognition with Hybrid HMM Maxout Models , 2013, ICLR.

[49] Yaroslav Bulatov,et al. Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[50] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[51] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[52] David R. Bull,et al. Robust texture features for blurred images using Undecimated Dual-Tree Complex Wavelets , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[53] Gueesang Lee,et al. Robust Text Detection in Natural Scene Images , 2016, Australasian Conference on Artificial Intelligence.