Exploring Font-independent Features for Scene Text Recognition

Scene text recognition (STR) has been extensively studied in last few years. Many recently-proposed methods are specially designed to accommodate the arbitrary shape, layout and orientation of scene texts, but ignoring that various font (or writing) styles also pose severe challenges to STR. These methods, where font features and content features of characters are tangled, perform poorly in text recognition on scene images with texts in novel font styles. To address this problem, we explore font-independent features of scene texts via attentional generation of glyphs in a large number of font styles. Specifically, we introduce trainable font embeddings to shape the font styles of generated glyphs, with the image feature of scene text only representing its essential patterns. The generation process is directed by the spatial attention mechanism, which effectively copes with irregular texts and generates higher-quality glyphs than existing image-to-image translation methods. Experiments conducted on several STR benchmarks demonstrate the superiority of our method compared to the state of the art.

[1]  Kaigui Bian,et al.  Symmetry-Constrained Rectification Network for Scene Text Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Wei Liu,et al.  Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition , 2018, AAAI.

[3]  Bhaskara Marthi,et al.  A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs , 2017, Science.

[4]  Shijian Lu,et al.  Spatial Fusion GAN for Image Synthesis , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[6]  Bo Xu,et al.  NRTR: A No-Recurrence Sequence-to-Sequence Model for Scene Text Recognition , 2018, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[7]  C. V. Jawahar,et al.  Top-down and bottom-up cues for scene text recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Palaiahnakote Shivakumara,et al.  A robust arbitrary text detection system for natural scene images , 2014, Expert Syst. Appl..

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[11]  Xiang Bai,et al.  Robust Scene Text Recognition with Automatic Rectification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Seong Joon Oh,et al.  What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Andrew Zisserman,et al.  Deep Structured Output Learning for Unconstrained Text Recognition , 2014, ICLR.

[14]  Jiri Matas,et al.  COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images , 2016, ArXiv.

[15]  Pan He,et al.  Reading Scene Text in Deep Convolutional Sequences , 2015, AAAI.

[16]  Shuigeng Zhou,et al.  AON: Towards Arbitrarily-Oriented Text Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Shuigeng Zhou,et al.  Focusing Attention: Towards Accurate Text Recognition in Natural Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Yang Liu,et al.  Synthetically Supervised Feature Learning for Scene Text Recognition , 2018, ECCV.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[22]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[23]  Peng Wang,et al.  Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition , 2018, AAAI.

[24]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[26]  Palaiahnakote Shivakumara,et al.  Recognizing Text with Perspective Distortion in Natural Scenes , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[28]  Shuigeng Zhou,et al.  Edit Probability for Scene Text Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Shijian Lu,et al.  ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[32]  Xiang Bai,et al.  ASTER: An Attentional Scene Text Recognizer with Flexible Rectification , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[34]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Yongdong Zhang,et al.  Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling , 2018, ACM Multimedia.

[36]  Shijian Lu,et al.  GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Jianguo Xiao,et al.  Boosting scene character recognition by learning canonical forms of glyphs , 2019, International Journal on Document Analysis and Recognition (IJDAR).

[38]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[39]  Simon Osindero,et al.  Recursive Recurrent Nets with Attention Modeling for OCR in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Thomas S. Huang,et al.  DeepFont: Identify Your Font from An Image , 2015, ACM Multimedia.

[41]  Lianwen Jin,et al.  Multi-font printed Chinese character recognition using multi-pooling convolutional neural network , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).