PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System

Optical Character Recognition (OCR) systems have been widely used in various of application scenarios. Designing an OCR system is still a challenging task. In previous work, we proposed a practical ultra lightweight OCR system (PPOCR) to balance the accuracy against the efficiency. In order to improve the accuracy of PP-OCR and keep high efficiency, in this paper, we propose a more robust OCR system, i.e. PPOCRv2. We introduce bag of tricks to train a better text detector and a better text recognizer, which include Collaborative Mutual Learning (CML), CopyPaste, Lightweight CPU Network (PP-LCNet), Unified-Deep Mutual Learning (U-DML) and Enhanced CTCLoss. Experiments on real data show that the precision of PP-OCRv2 is 7% higher than PP-OCR under the same inference cost. It is also comparable to the server models of the PP-OCR which uses ResNet series as backbones. All of the above mentioned models are open-sourced and the code is available in the GitHub repository PaddleOCR 1 which is powered by PaddlePaddle .

[1]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Partha Pratim Roy,et al.  ICDAR 2011 Robust Reading Competition - Challenge 1: Reading Text in Born-Digital Images (Web and Email) , 2011, 2011 International Conference on Document Analysis and Recognition.

[3]  Fei Yin,et al.  Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression , 2018, IEEE Transactions on Image Processing.

[4]  Liusheng Huang,et al.  Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline , 2018, ECCV.

[5]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[6]  Wafa Khlif,et al.  ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition — RRC-MLT-2019 , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[7]  Kai Chen,et al.  Real-time Scene Text Detection with Differentiable Binarization , 2019, AAAI.

[8]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[10]  Jun Zhou,et al.  PP-OCR: A Practical Ultra Lightweight OCR System , 2020, ArXiv.

[11]  Zheng Huang,et al.  ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[12]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Quoc V. Le,et al.  Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Quoc V. Le,et al.  MixConv: Mixed Depthwise Convolutional Kernels , 2019, BMVC.

[15]  Sangdoo Yun,et al.  A Comprehensive Overhaul of Feature Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Errui Ding,et al.  Chinese Street View Text: Large-Scale Chinese Text Reading With Partially Supervised Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[20]  Shijian Lu,et al.  ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[21]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.