PP-OCR: A Practical Ultra Lightweight OCR System

The Optical Character Recognition (OCR) systems have been widely used in various of application scenarios, such as office automation (OA) systems, factory automations, online educations, map productions etc. However, OCR is still a challenging task due to the various of text appearances and the demand of computational efficiency. In this paper, we propose a practical ultra lightweight OCR system, i.e., PP-OCR. The overall model size of the PP-OCR is only 3.5M for recognizing 6622 Chinese characters and 2.8M for recognizing 63 alphanumeric symbols, respectively. We introduce a bag of strategies to either enhance the model ability or reduce the model size. The corresponding ablation experiments with the real data are also provided. Meanwhile, several pre-trained models for the Chinese and English recognition are released, including a text detector (97K images are used), a direction classifier (600K images are used) as well as a text recognizer (17.9M images are used). Besides, the proposed PP-OCR are also verified in several other language recognition tasks, including French, Korean, Japanese and German. All of the above mentioned models are open-sourced and the codes are available in the GitHub repository, i.e., this https URL.

[1]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[4]  Hengshuang Zhao,et al.  GridMask Data Augmentation , 2020, ArXiv.

[5]  Zhi Zhang,et al.  Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Canjie Luo,et al.  Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Errui Ding,et al.  Towards Accurate Scene Text Recognition With Semantic Reasoning Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Swagath Venkataramani,et al.  PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[10]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[11]  Yi Yang,et al.  Random Erasing Data Augmentation , 2017, AAAI.

[12]  Zheng Huang,et al.  ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[13]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Yong Jae Lee,et al.  Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Ping Liu,et al.  Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Partha Pratim Roy,et al.  ICDAR 2011 Robust Reading Competition - Challenge 1: Reading Text in Born-Digital Images (Web and Email) , 2011, 2011 International Conference on Document Analysis and Recognition.

[17]  Errui Ding,et al.  Chinese Street View Text: Large-Scale Chinese Text Reading With Partially Supervised Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[19]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Shijian Lu,et al.  ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[23]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[24]  Fei Yin,et al.  Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression , 2018, IEEE Transactions on Image Processing.

[25]  Liusheng Huang,et al.  Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline , 2018, ECCV.

[26]  Kai Chen,et al.  Real-time Scene Text Detection with Differentiable Binarization , 2019, AAAI.

[27]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.