MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes

The recognition methods for Chinese text lines, as an important component of optical character recognition, have been widely applied in many specific tasks. However, there are still some potential challenges: (1) lack of open Chinese text recognition dataset; (2) challenges caused by the characteristics of Chinese characters, e.g., diverse types, complex structure and various sizes; (3) difficulties brought by text images in different scenes, e.g., blur, illumination and distortion. In order to address these challenges, we propose an end-to-end recognition method based on convolutional recurrent neural networks (CRNNs), i.e., multi-scale attention CRNN, which adds three components on the basis of a CRNN: asymmetric convolution, feature reuse network and attention mechanism. The proposed model is mainly aimed at scene text recognition including Chinese characters. Then the model is trained and tested on two Chinese text recognition datasets, i.e., the open dataset MTWI and our constructed large-scale Chinese text line dataset collected from various scenes. The experimental results demonstrate that the proposed method achieves better performance than other methods.

[1]  J. Shotton,et al.  Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2011 .

[2]  Xiaolin Hu,et al.  Gated Recurrent Convolution Neural Network for OCR , 2017, NIPS.

[3]  Lianwen Jin,et al.  ICPR2018 Contest on Robust Reading for Multi-Type Web Images , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[4]  Xiang Bai,et al.  Robust Scene Text Recognition with Automatic Rectification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[7]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[8]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Xiang Bai,et al.  ASTER: An Attentional Scene Text Recognizer with Flexible Rectification , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Wenyu Liu,et al.  Strokelets: A Learned Multi-scale Representation for Scene Text Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Wenping Hu,et al.  A Comparative Study of Attention-Based Encoder-Decoder Approaches to Natural Scene Text Recognition , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[14]  Lianwen Jin,et al.  A Multi-Object Rectified Attention Network for Scene Text Recognition , 2019, Pattern Recognit..

[15]  José A. Rodríguez-Serrano,et al.  Label embedding for text recognition , 2013, BMVC.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yann LeCun,et al.  Traffic sign recognition with multi-scale Convolutional Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[18]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[19]  Jun Du,et al.  RAN: Radical analysis networks for zero-shot learning of Chinese characters , 2017, ArXiv.

[20]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[21]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  William Stafford Noble,et al.  Support vector machine , 2013 .

[23]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[24]  Tatiana Novikova,et al.  Large-Lexicon Attribute-Consistent Text Recognition in Natural Images , 2012, ECCV.

[25]  C. V. Jawahar,et al.  Whole is Greater than Sum of Parts: Recognizing Scene Text Words , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[26]  Majid Mirmehdi,et al.  A Head-Mounted Device for Recognizing Text in Natural Scenes , 2011, CBDAR.

[27]  Hartmut Neven,et al.  PhotoOCR: Reading Text in Uncontrolled Conditions , 2013, 2013 IEEE International Conference on Computer Vision.

[28]  Wenyu Liu,et al.  Strokelets: A Learned Multi-Scale Mid-Level Representation for Scene Text Recognition , 2016, IEEE Transactions on Image Processing.

[29]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Joelle Pineau,et al.  End-to-End Text Recognition with Hybrid HMM Maxout Models , 2013, ICLR.