Text Detection and Recognition for Natural Scene Images Using Deep Convolutional Neural Networks

Words are the most indispensable information in human life. It is very important to analyze and understand the meaning of words. Compared with the general visual elements, the text conveys rich and high-level moral information, which enables the computer to better understand the semantic content of the text. With the rapid development of computer technology, great achievements have been made in text information detection and recognition. However, when dealing with text characters in natural scene images, there are still some limitations in the detection and recognition of natural scene images. Because natural scene image has more interference and complexity than text, these factors make the detection and recognition of natural scene image text face many challenges. To solve this problem, a new text detection and recognition method based on depth convolution neural network is proposed for natural scene image in this paper. In text detection, this method obtains high-level visual features from the bottom pixels by ResNet network, and extracts the context features from character sequences by BLSTM layer, then introduce to the idea of faster R-CNN vertical anchor point to find the bounding box of the detected text, which effectively improves the effect of text object detection. In addition, in text recognition task, DenseNet model is used to construct character recognition based on Kares. Finally, the output of Softmax is used to classify each character. Our method can replace the artificially defined features with automatic learning and context-based features. It improves the efficiency and accuracy of recognition, and realizes text detection and recognition of natural scene images. And on the PAC2018 competition platform, the experimental results have achieved good results.

[1]  Ruohan Meng A fusion steganographic algorithm based on Faster R-CNN , 2018 .

[2]  Junjie Yan,et al.  FOTS: Fast Oriented Text Spotting with a Unified Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Kilian Q. Weinberger,et al.  CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Yuting Gao,et al.  Fused Text Segmentation Networks for Multi-oriented Scene Text Detection , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[5]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[6]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jiri Matas,et al.  FASText: Efficient Unconstrained Scene Text Detector , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Limin Wang,et al.  Places205-VGGNet Models for Scene Recognition , 2015, ArXiv.

[10]  Xiang Bai,et al.  Symmetry-based text line detection in natural scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Francesca Murabito,et al.  Superpixel-based video object segmentation using perceptual organization and location prior , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Naigong Yu,et al.  Handwritten digits recognition base on improved LeNet5 , 2015, The 27th Chinese Control and Decision Conference (2015 CCDC).

[14]  Jiri Matas,et al.  Efficient Scene text localization and recognition with local character refinement , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[15]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Wenyu Liu,et al.  A Unified Framework for Multioriented Text Detection and Recognition , 2014, IEEE Transactions on Image Processing.

[17]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[19]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[20]  Weilin Huang,et al.  Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[22]  Kaizhu Huang,et al.  Robust Text Detection in Natural Scene Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Antonio Torralba,et al.  Inverting and Visualizing Features for Object Detection , 2012, ArXiv.

[24]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[25]  Matthieu Cord,et al.  Snoopertext: A multiresolution system for text detection in complex visual scenes , 2010, 2010 IEEE International Conference on Image Processing.

[26]  Nicu Sebe,et al.  Evaluation of Intensity and Color Corner Detectors for Affine Invariant Salient Regions , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[27]  Ming-Hsuan Yang,et al.  Kernel Eigenfaces vs. Kernel Fisherfaces: Face recognition using kernel methods , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.