论文信息 - End-to-end text recognition with convolutional neural networks

End-to-end text recognition with convolutional neural networks

Full end-to-end text recognition in natural images is a challenging problem that has received much attention recently. Traditional systems in this area have relied on elaborate models incorporating carefully hand-engineered features or large amounts of prior knowledge. In this paper, we take a different route and combine the representational power of large, multilayer neural networks together with recent developments in unsupervised feature learning, which allows us to use a common framework to train highly-accurate text detector and character recognizer modules. Then, using only simple off-the-shelf methods, we integrate these two modules into a full end-to-end, lexicon-driven, scene text recognition system that achieves state-of-the-art performance on standard benchmarks, namely Street View Text and ICDAR 2003.

[1] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[2] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[3] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4] Erkki Oja,et al. Independent component analysis: algorithms and applications , 2000, Neural Networks.

[5] Cheng-Lin Liu,et al. Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6] Simon M. Lucas,et al. ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[7] William W. Cohen,et al. Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[8] Alan L. Yuille,et al. Detecting and reading text in natural scenes , 2004, CVPR 2004.

[9] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[10] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[11] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .

[13] Luc Van Gool,et al. Efficient Non-Maximum Suppression , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[14] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[15] Marc'Aurelio Ranzato,et al. Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[16] Rajat Raina,et al. Efficient sparse coding algorithms , 2006, NIPS.

[17] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[18] Zohra Saidane,et al. Automatic Scene Text Recognition using a Convolutional Neural Network , 2007 .

[19] Michele Merler,et al. Recognizing Groceries in situ Using in vitro Training Data , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Allen R. Hanson,et al. A discriminative semi-Markov model for robust scene text recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[21] Cheng-Lin Liu,et al. A Robust System to Detect and Localize Texts in Natural Scene Images , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[22] Yihong Gong,et al. Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Quoc V. Le,et al. Scalable learning for object detection with GPU hardware , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24] Quoc V. Le,et al. Measuring Invariances in Deep Networks , 2009, NIPS.

[25] Allen R. Hanson,et al. Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[27] Cheng-Lin Liu,et al. Text Localization in Natural Scene Images Based on Conditional Random Field , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[28] Guoliang Fan,et al. Graphical Models for Joint Segmentation and Recognition of License Plate Characters , 2007, IEEE Signal Processing Letters.

[29] Manik Varma,et al. Character Recognition in Natural Images , 2009, VISAPP.

[30] Kai Wang,et al. Word Spotting in the Wild , 2010, ECCV.

[31] Yonatan Wexler,et al. Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32] Jean Ponce,et al. Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33] Jiri Matas,et al. A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[34] Yann LeCun,et al. Learning Fast Approximations of Sparse Coding , 2010, ICML.

[35] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[36] Andrew Y. Ng,et al. Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning , 2011, 2011 International Conference on Document Analysis and Recognition.

[37] Kai Wang,et al. End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[38] Quoc V. Le,et al. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[39] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[40] Luca Maria Gambardella,et al. High-Performance Neural Networks for Visual Object Classification , 2011, ArXiv.

[41] A. V. Olgac,et al. Performance Analysis of Various Activation Functions in Generalized MLP Architectures of Neural Networks , 2011 .

[42] Luca Maria Gambardella,et al. Better Digit Recognition with a Committee of Simple Neural Nets , 2011, 2011 International Conference on Document Analysis and Recognition.

[43] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[44] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[45] C. V. Jawahar,et al. Top-down and bottom-up cues for scene text recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.