Detecting Uyghur text in complex background images with convolutional neural network

Uyghur text detection is crucial to a variety of real-world applications, while little researches put their attention on it. In this paper, we develop an effective and efficient region-based convolutional neural network for Uyghur text detection in complex background images. The characteristics of the network include: (1) Three region proposal networks are used to improve the recall, which simultaneously utilize feature maps from different convolutional layers. (2) The overall architecture of our network is in the form of fully convolutional network, and global average pooling is applied to replace the fully connected layers in the classification and bounding box regression layers. (3) To fully utilize the baseline information, Uyghur text lines are detected directly by the network in an end-to-end fashion. Experiment results on benchmark dataset show that our method achieves an F-measure of 0.83 and detection time of 0.6 s for each image in a single K20c GPU, which is much faster than the state-of-the-art methods while keeps competitive accuracy.

[1]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Saeed Mozaffari,et al.  Farsi/Arabic text extraction from video images by corner detection , 2010, 2010 6th Iranian Conference on Machine Vision and Image Processing.

[3]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[4]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[5]  Kaizhu Huang,et al.  Robust Text Detection in Natural Scene Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[7]  Xiang Bai,et al.  Symmetry-based text line detection in natural scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[9]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Weilin Huang,et al.  Text-Attentional Convolutional Neural Network for Scene Text Detection , 2015, IEEE Transactions on Image Processing.

[11]  Bo Xu,et al.  Chinese Image Text Recognition on grayscale pixels , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[13]  Zheng Xu,et al.  Video structured description technology based intelligence analysis of surveillance videos for public security applications , 2015, Multimedia Tools and Applications.

[14]  Shuangquan Wang,et al.  Unobtrusive Sensing Incremental Social Contexts Using Fuzzy Class Incremental Learning , 2015, 2015 IEEE International Conference on Data Mining.

[15]  Bo Xu,et al.  Image character recognition using deep convolutional neural network learned from different languages , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[16]  Junchi Yan,et al.  Visual Saliency Detection via Sparsity Pursuit , 2010, IEEE Signal Processing Letters.

[17]  Yongdong Zhang,et al.  Pairwise weak geometric consistency for large scale image search , 2011, ICMR.

[18]  Rolf Ingold,et al.  A dataset for Arabic text detection, tracking and recognition in news videos- AcTiV , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[19]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jun Zhang,et al.  Multi-Orientation Scene Text Detection with Adaptive Clustering , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[22]  Wenyu Liu,et al.  Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Weilin Huang,et al.  Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[25]  Yi Li,et al.  Orientation Robust Text Line Detection in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Christophe Garcia,et al.  ALIF: A dataset for Arabic embedded text recognition in TV broadcast , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[27]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  David S. Doermann,et al.  Text Detection and Recognition in Imagery: A Survey , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Andreas Dengel,et al.  ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[30]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Sheng Tang,et al.  Robust common visual pattern discovery using graph matching , 2013, J. Vis. Commun. Image Represent..

[32]  Adel M. Alimi,et al.  A Comprehensive Method for Arabic Video Text Detection, Localization, Extraction and Recognition , 2010, PCM.

[33]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Jean-Michel Jolion,et al.  Object count/area graphs for the evaluation of object detection and segmentation algorithms , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[35]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[36]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[37]  Jie Yuan,et al.  A method for text line detection in natural images , 2013, Multimedia Tools and Applications.

[38]  Weilin Huang,et al.  Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  Yunhuai Liu,et al.  The big data analytics and applications of the surveillance system using video structured description technology , 2016, Cluster Computing.

[40]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[41]  Ashraf M. A. Ahmad,et al.  A Robust Algorithm for Arabic Video Text Detection , 2012 .

[42]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[43]  Lan Chen,et al.  Semantic enhanced cloud environment for surveillance data management using video structural description , 2014, Computing.

[44]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[45]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[46]  Palaiahnakote Shivakumara,et al.  Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing , 2013, Multimedia Tools and Applications.

[47]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Shijian Lu,et al.  Text Flow: A Unified Text Detection System in Natural Scene Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49]  Yongdong Zhang,et al.  Common visual pattern discovery via graph matching , 2011, ACM Multimedia.

[50]  Changsheng Li,et al.  On Estimating Air Pollution from Photos Using Convolutional Neural Network , 2016, ACM Multimedia.

[51]  Jiri Matas,et al.  Real-Time Lexicon-Free Scene Text Localization and Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Kamran Iqbal,et al.  Efficient Arabic text extraction and recognition using thinning and dataset comparison technique , 2015, 2015 International Conference on Communication, Information & Computing Technology (ICCICT).

[53]  Xi Chen,et al.  Robust Uyghur Text Localization in Complex Background Images , 2016, PCM.