Tracking Based Multi-Orientation Scene Text Detection: A Unified Framework With Dynamic Programming

There are a variety of grand challenges for multi-orientation text detection in scene videos, where the typical issues include skew distortion, low contrast, and arbitrary motion. Most conventional video text detection methods using individual frames have limited performance. In this paper, we propose a novel tracking based multi-orientation scene text detection method using multiple frames within a unified framework via dynamic programming. First, a multi-information fusion-based multi-orientation text detection method in each frame is proposed to extensively locate possible character candidates and extract text regions with multiple channels and scales. Second, an optimal tracking trajectory is learned and linked globally over consecutive frames by dynamic programming to finally refine the detection results with all detection, recognition, and prediction information. Moreover, the effectiveness of our proposed system is evaluated with the state-of-the-art performances on several public data sets of multi-orientation scene text images and videos, including MSRA-TD500, USTB-SV1K, and ICDAR 2015 Scene Videos.

[1]  Cheng-Lin Liu,et al.  A Hybrid Approach to Detect and Localize Texts in Natural Scene Images , 2011, IEEE Transactions on Image Processing.

[2]  David S. Doermann,et al.  Text Detection and Recognition in Imagery: A Survey , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jun Wang,et al.  Consistency-Driven Alternating Optimization for Multigraph Matching: A Unified Approach , 2015, IEEE Transactions on Image Processing.

[4]  Ya Su,et al.  A Unified Framework for Tracking Based Text Detection and Recognition from Web Videos , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yangsheng Xu,et al.  A Wearable Translation Robot , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[6]  Yi Li,et al.  Orientation Robust Text Line Detection in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Palaiahnakote Shivakumara,et al.  A New Technique for Multi-Oriented Scene Text Line Detection and Tracking in Video , 2015, IEEE Transactions on Multimedia.

[8]  Kaizhu Huang,et al.  Accurate and robust text detection: a step-in for text retrieval in natural scene images , 2013, SIGIR.

[9]  Dimosthenis Karatzas,et al.  MSER-Based Real-Time Text Detection and Tracking , 2014, 2014 22nd International Conference on Pattern Recognition.

[10]  Xu-Cheng Yin,et al.  Effective text localization in natural scene images with MSER, geometry-based grouping and AdaBoost , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[11]  Jun Zhang,et al.  Multi-Orientation Scene Text Detection with Adaptive Clustering , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jiri Matas,et al.  Scene Text Localization and Recognition with Oriented Stroke Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Yonghong Song,et al.  Natural Scene Text Detection with Multi-channel Connected Component Segmentation , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[14]  Satoshi Naoi,et al.  Robust Vanishing Point Detection for MobileCam-Based Documents , 2011, 2011 International Conference on Document Analysis and Recognition.

[15]  Kazuhiko Yamamoto,et al.  Development of a guide dog system for the blind people with character recognition ability , 2004, ICPR 2004.

[16]  S.M. Lucas,et al.  ICDAR 2005 text locating competition results , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[17]  Weilin Huang,et al.  Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  James M. Coughlan,et al.  Localizing blurry and low-resolution text in natural images , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[19]  Wenyu Liu,et al.  A Unified Framework for Multioriented Text Detection and Recognition , 2014, IEEE Transactions on Image Processing.

[20]  Xu-Cheng Yin,et al.  Scene Text Detection in Video by Learning Locally and Globally , 2016, IJCAI.

[21]  Qian Huang,et al.  Character extraction of license plates from video , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Hiroaki Kobayashi,et al.  An Efficient Text Capture Method for Moving Robots Using DCT Feature and Text Tracking , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[23]  Xu-Cheng Yin,et al.  Robust Text Detection in Natural Scene Images. , 2014, IEEE transactions on pattern analysis and machine intelligence.

[24]  François Michaud,et al.  Autonomous Mobile Robot That Can Read , 2004, EURASIP J. Adv. Signal Process..

[25]  David Zhang,et al.  Fast Tracking via Spatio-Temporal Context Learning , 2013, ArXiv.

[26]  Hang Joon Kim,et al.  Locating car license plates using neural networks , 1999 .

[27]  Shijian Lu,et al.  Gradient Vector Flow and Grouping-Based Method for Arbitrarily Oriented Scene Text Detection in Video Images , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Palaiahnakote Shivakumara,et al.  A blind deconvolution model for scene text detection and recognition in video , 2016, Pattern Recognit..

[30]  Konrad Schindler,et al.  Multi-Target Tracking by Discrete-Continuous Energy Minimization , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Weilin Huang,et al.  Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[32]  David Zhang,et al.  Fast Visual Tracking via Dense Spatio-temporal Context Learning , 2014, ECCV.

[33]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[34]  Kazuhiko Yamamoto,et al.  Development of a guide dog system for the blind with character recognition ability , 2004, First Canadian Conference on Computer and Robot Vision, 2004. Proceedings..

[35]  François Michaud,et al.  Textual message read by a mobile robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[36]  Xilin Chen,et al.  Incremental detection of text on road signs from video with application to a driving assistant system , 2004, MULTIMEDIA '04.

[37]  Yingli Tian,et al.  Text extraction from scene images by character appearance and structure modeling , 2013, Comput. Vis. Image Underst..

[38]  Xu-Cheng Yin,et al.  Text Detection, Tracking and Recognition in Video: A Comprehensive Survey. , 2016, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[39]  Xiang Bai,et al.  Symmetry-based text line detection in natural scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Xilin Chen,et al.  Detection of text on road signs from video , 2005, IEEE Trans. Intell. Transp. Syst..

[41]  Nobuo Ezaki,et al.  Text detection from natural scene images: towards a system for visually impaired persons , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[42]  Ernest Valveny,et al.  ICDAR 2015 competition on Robust Reading , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[43]  Xu-Cheng Yin,et al.  Multi-strategy tracking based text detection in scene videos , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[44]  Hongyuan Zha,et al.  Multi-Graph Matching via Affinity Optimization with Graduated Consistency Regularization , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Palaiahnakote Shivakumara,et al.  Multi-Spectral Fusion Based Approach for Arbitrarily Oriented Scene Text Detection in Video Images , 2015, IEEE Transactions on Image Processing.

[47]  Eckehard Hermann,et al.  Rihamark: perceptual image hash benchmarking , 2011, Electronic Imaging.

[48]  Jorge Stolfi,et al.  Snoopertrack: Text detection and tracking for outdoor videos , 2011, 2011 18th IEEE International Conference on Image Processing.

[49]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[51]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[52]  Makoto Tanaka,et al.  Text-tracking wearable camera system for visually-impaired people , 2008, 2008 19th International Conference on Pattern Recognition.

[53]  Weilin Huang,et al.  Text-Attentional Convolutional Neural Network for Scene Text Detection , 2015, IEEE Transactions on Image Processing.

[54]  Ismail Haritaoglu Scene text extraction and translation for handheld devices , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[55]  Palaiahnakote Shivakumara,et al.  A robust arbitrary text detection system for natural scene images , 2014, Expert Syst. Appl..

[56]  Matthew Turk,et al.  TranslatAR: A mobile augmented reality translator , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[57]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[58]  Makoto Tanaka,et al.  Text-Tracking Wearable Camera System for the Blind , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[59]  Charles Baur,et al.  Automatic text detection for mobile augmented reality translation , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[60]  Palaiahnakote Shivakumara,et al.  A new Histogram Oriented Moments descriptor for multi-oriented moving text detection in video , 2015, Expert Syst. Appl..

[61]  Yao Li,et al.  Characterness: An Indicator of Text in the Wild , 2013, IEEE Transactions on Image Processing.