Sample-aware Data Augmentor for Scene Text Recognition

Deep neural networks (DNNs) have been widely used in scene text recognition, and achieved remarkable performance. Such DNN-based scene text recognizers usually require plenty of training data for training, but data collection and annotation is usually cost-expensive in practice. To alleviate this issue, data augmentation is often applied to train the scene text recognizers. However, existing data augmentation methods including affine transformation and elastic transformation methods suffer from the problems of under- and over-diversity, due to the complexity of text contents and shapes. In this paper, we propose a sample-aware data augmentor to transform samples adaptively based on the contents of samples. Specifically, our data augmentor consists of three parts: gated module, affine transformation module, and elastic transformation module. In our data augmentor, affine transformation module focuses on keeping the affinity of samples, while elastic transformation module aims to improve the diversity of samples. With the gated module, our data augmentor determines transformation type adaptively based on the properties of training samples and the recognizer capability during the training process. Besides, our framework introduces an adversarial learning strategy to optimize the augmentor and the recognizer jointly. Extensive experiments on scene text recognition benchmarks show that our sample-aware data augmentor significantly improves the performance of state-of-the-art scene text recognizer.

[1]  Xiang Bai,et al.  Robust Scene Text Recognition with Automatic Rectification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[3]  Wei Liu,et al.  STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition , 2016, BMVC.

[4]  Wei Liu,et al.  Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition , 2018, AAAI.

[5]  S. Lucas,et al.  ICDAR 2003 robust reading competitions: entries, results, and future directions , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[6]  Kaigui Bian,et al.  Rethinking Irregular Scene Text Recognition , 2019 .

[7]  Shuigeng Zhou,et al.  Edit Probability for Scene Text Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Andrew Zisserman,et al.  Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[9]  Shuigeng Zhou,et al.  Focusing Attention: Towards Accurate Text Recognition in Natural Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Xianzhi Li,et al.  PointAugment: An Auto-Augmentation Framework for Point Cloud Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Shu-Tao Xia,et al.  Entropy-based bilateral filtering with a new range kernel , 2017, Signal Process..

[12]  Jian Zhang,et al.  Scene Text Recognition from Two-Dimensional Perspective , 2018, AAAI.

[13]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[14]  Xiang Bai,et al.  Scene text detection and recognition: recent advances and future trends , 2015, Frontiers of Computer Science.

[15]  Palaiahnakote Shivakumara,et al.  A robust arbitrary text detection system for natural scene images , 2014, Expert Syst. Appl..

[16]  Partha Pratim Roy,et al.  Handwriting Recognition in Low-Resource Scripts Using Adversarial Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiang Bai,et al.  ASTER: An Attentional Scene Text Recognizer with Flexible Rectification , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[19]  Serge J. Belongie,et al.  Convolutional Networks with Adaptive Inference Graphs , 2017, International Journal of Computer Vision.

[20]  Yang Liu,et al.  Synthetically Supervised Feature Learning for Scene Text Recognition , 2018, ECCV.

[21]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[22]  Josef Kittler,et al.  Elastic transformation of the image pixel grid for similarity based face identification , 2002, Object recognition supported by user interaction for service robots.

[23]  Canjie Luo,et al.  Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  C. V. Jawahar,et al.  Top-down and bottom-up cues for scene text recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[26]  Fred L. Bookstein,et al.  Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[28]  Fei Yang,et al.  Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[30]  Shu-Tao Xia,et al.  Second-Order Attention Network for Single Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Palaiahnakote Shivakumara,et al.  Recognizing Text with Perspective Distortion in Natural Scenes , 2013, 2013 IEEE International Conference on Computer Vision.

[32]  Shuigeng Zhou,et al.  AON: Towards Arbitrarily-Oriented Text Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Scott Schaefer,et al.  Image deformation using moving least squares , 2006, ACM Trans. Graph..

[34]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[35]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Zihan Zhou,et al.  Learning to Read Irregular Text with Attention Mechanisms , 2017, IJCAI.

[37]  Peng Wang,et al.  Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition , 2018, AAAI.

[38]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[39]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.