A Novel Text Structure Feature Extractor for Chinese Scene Text Detection and Recognition

Scene text information extraction plays an important role in many computer vision applications. Most features in existing text extraction algorithms are only applicable to one text extraction stage (text detection or recognition), which significantly weakens the consistency in an end-to-end system, especially for the complex Chinese texts. To tackle this challenging problem, we propose a novel text structure feature extractor based on a text structure component detector (TSCD) layer and residual network for Chinese texts. Inspired by the three-layer Chinese text cognition model of a human, we combine the TSCD layer and the residual network to extract features suitable for both text extraction stages. The specialized modeling for Chinese characters in the TSCD layer simulates the key structure component cognition layer in the psychological model. And the residual mechanism in the residual network simulates the key bidirectional connection among the layers in the psychological model. Through the organic combination of the TSCD layer and the residual network, the extracted features are applicable to both text detection and recognition, as humans do. In evaluation, both text detection and recognition models based on our proposed text structure feature extractor achieve great improvements over baseline CNN models. And an end-to-end Chinese text information extraction system is experimentally designed and evaluated, showing the advantage of the proposed feature extractor as a unified feature extractor.

[1]  Shuchang Zhou,et al.  ICDAR 2015 Text Reading in the Wild Competition , 2015, ArXiv.

[2]  Yaroslav Bulatov,et al.  Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[3]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[4]  Shijian Lu,et al.  Accurate Scene Text Recognition Based on Recurrent Neural Network , 2014, ACCV.

[5]  Zhaoyang Lu,et al.  Detection and Segmentation Text from Natural Scene Images Based on Graph Model , 2014 .

[6]  Andreas Dengel,et al.  ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[7]  Wen Gao,et al.  Fast and effective text detection , 2008, 2008 15th IEEE International Conference on Image Processing.

[8]  Chew Lim Tan,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence, Manuscript Id a Laplacian Approach to Multi-oriented Text Detection in Video , 2022 .

[9]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Wei Liang,et al.  A Chinese Character Localization Method Based on Intergrating Structure and CC-Clustering for Advertising Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[11]  Wenyu Liu,et al.  Strokelets: A Learned Multi-scale Representation for Scene Text Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[13]  Qifeng Liu,et al.  A stroke filter and its application to text localization , 2009, Pattern Recognit. Lett..

[14]  Cheng-Lin Liu,et al.  A Hybrid Approach to Detect and Localize Texts in Natural Scene Images , 2011, IEEE Transactions on Image Processing.

[15]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[16]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[17]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[18]  Jiri Matas,et al.  On Combining Multiple Segmentations in Scene Text Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[19]  Kai Chen,et al.  A new unsupervised convolutional neural network model for Chinese scene text detection , 2015, 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP).

[20]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, CVPR 2004.

[21]  Andrew Zisserman,et al.  Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[22]  Kaizhu Huang,et al.  Robust Text Detection in Natural Scene Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Weilin Huang,et al.  Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.