Scene Text Recognition with Sliding Convolutional Character Models

Scene text recognition has attracted great interests from the computer vision and pattern recognition community in recent years. State-of-the-art methods use concolutional neural networks (CNNs), recurrent neural networks with long short-term memory (RNN-LSTM) or the combination of them. In this paper, we investigate the intrinsic characteristics of text recognition, and inspired by human cognition mechanisms in reading texts, we propose a scene text recognition method with character models on convolutional feature map. The method simultaneously detects and recognizes characters by sliding the text line image with character models, which are learned end-to-end on text line images labeled with text transcripts. The character classifier outputs on the sliding windows are normalized and decoded with Connectionist Temporal Classification (CTC) based algorithm. Compared to previous methods, our method has a number of appealing properties: (1) It avoids the difficulty of character segmentation which hinders the performance of segmentation-based recognition methods; (2) The model can be trained simply and efficiently because it avoids gradient vanishing/exploding in training RNN-LSTM based models; (3) It bases on character models trained free of lexicon, and can recognize unknown words. (4) The recognition process is highly parallel and enables fast recognition. Our experiments on several challenging English and Chinese benchmarks, including the IIIT-5K, SVT, ICDAR03/13 and TRW15 datasets, demonstrate that the proposed method yields superior or comparable performance to state-of-the-art methods while the model size is relatively small.

[1]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[2]  Yoshua Bengio,et al.  Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark , 2016, Pattern Recognit..

[3]  Simon Osindero,et al.  Recursive Recurrent Nets with Attention Modeling for OCR in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[5]  Xiang Bai,et al.  Robust Scene Text Recognition with Automatic Rectification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[7]  S. Lucas,et al.  ICDAR 2003 robust reading competitions: entries, results, and future directions , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[8]  Wenyu Liu,et al.  Strokelets: A Learned Multi-scale Representation for Scene Text Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Tatiana Novikova,et al.  Large-Lexicon Attribute-Consistent Text Recognition in Natural Images , 2012, ECCV.

[10]  Allen R. Hanson,et al.  Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Rafael C. González,et al.  Digital image processing, 3rd Edition , 2008 .

[12]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[13]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[14]  Chunheng Wang,et al.  Scene Text Recognition Using Part-Based Tree-Structured Character Detection , 2013, CVPR 2013.

[15]  Seong-Whan Lee,et al.  Integrated segmentation and recognition of handwritten numerals with cascade neural network , 1999, IEEE Trans. Syst. Man Cybern. Part C.

[16]  Yunhong Wang,et al.  Random Projected Convolutional Feature for Scene Text Recognition , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[17]  Marcin Junczys-Dowmunt,et al.  The United Nations Parallel Corpus v1.0 , 2016, LREC.

[18]  C. V. Jawahar,et al.  Whole is Greater than Sum of Parts: Recognizing Scene Text Words , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[19]  Fei Yin,et al.  Handwritten Chinese Text Recognition by Integrating Multiple Contexts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Shijian Lu,et al.  Accurate Scene Text Recognition Based on Recurrent Neural Network , 2014, ACCV.

[21]  Hartmut Neven,et al.  PhotoOCR: Reading Text in Uncontrolled Conditions , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[23]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[24]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[25]  C. V. Jawahar,et al.  Top-down and bottom-up cues for scene text recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Qiang Guo,et al.  Hybrid CNN-HMM Model for Street View House Number Recognition , 2014, ACCV Workshops.

[27]  David S. Doermann,et al.  Text Detection and Recognition in Imagery: A Survey , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[30]  James A. Pittman,et al.  Integrated Segmentation and Recognition Through Exhaustive Scans or Learned Saccadic Jumps , 1993, Int. J. Pattern Recognit. Artif. Intell..

[31]  Andrew Zisserman,et al.  Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[32]  Ernest Valveny,et al.  Word Spotting and Recognition with Embedded Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Andrew Zisserman,et al.  Deep Structured Output Learning for Unconstrained Text Recognition , 2014, ICLR.

[34]  Pan He,et al.  Reading Scene Text in Deep Convolutional Sequences , 2015, AAAI.

[35]  Albert Gordo,et al.  Supervised mid-level features for word image representation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Joelle Pineau,et al.  End-to-End Text Recognition with Hybrid HMM Maxout Models , 2013, ICLR.

[37]  Shuchang Zhou,et al.  ICDAR 2015 Text Reading in the Wild Competition , 2015, ArXiv.

[38]  José A. Rodríguez-Serrano,et al.  Label embedding for text recognition , 2013, BMVC.