Scene Text Detection and Tracking in Video with Background Cues

To detect scene text in the video is valuable to many content-based video applications. In this paper, we present a novel scene text detection and tracking method for videos, which effectively exploits the cues of the background regions of the text. Specifically, we first extract text candidates and potential background regions of text from the video frame. Then, we exploit the spatial, shape and motional correlations between the text and its background region with a bipartite graph model and the random walk algorithm to refine the text candidates for improved accuracy. We also present an effective tracking framework for text in the video, making use of the temporal correlation of text cues across successive frames, which contributes to enhancing both the precision and the recall of the final text detection result. Experiments on public scene text video datasets demonstrate the state-of-the-art performance of the proposed method.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[3]  Alan L. Yuille,et al.  Detecting and reading text in natural scenes , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[4]  Jorge Stolfi,et al.  Snoopertrack: Text detection and tracking for outdoor videos , 2011, 2011 18th IEEE International Conference on Image Processing.

[5]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Shijian Lu,et al.  Multioriented Video Scene Text Detection Through Bayesian Classification and Boundary Growing , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Jean-Michel Jolion,et al.  Text localization, enhancement and binarization in multimedia documents , 2002, Object recognition supported by user interaction for service robots.

[8]  Weilin Huang,et al.  Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[9]  Jin Hyung Kim,et al.  Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Ankush Gupta,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Xiaodong Huang,et al.  A novel approach to detecting scene text in video , 2011, 2011 4th International Congress on Image and Signal Processing.

[12]  Chun Yang,et al.  Tracking Based Multi-Orientation Scene Text Detection: A Unified Framework With Dynamic Programming , 2017, IEEE Transactions on Image Processing.

[13]  Jiřı́ Matas,et al.  Real-time scene text localization and recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[15]  Palaiahnakote Shivakumara,et al.  Optical flow based dynamic curved video text detection , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[16]  Xiang Bai,et al.  Symmetry-based text line detection in natural scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Cordelia Schmid,et al.  Online Object Tracking with Proposal Selection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Xiangyang Xue,et al.  A Novel Video Text Extraction Approach Based on Multiple Frames , 2005, 2005 5th International Conference on Information Communications & Signal Processing.

[19]  Weilin Huang,et al.  Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Jiri Matas,et al.  Scene Text Localization and Recognition with Oriented Stroke Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Hideaki Goto,et al.  Autonomous Text Capturing Robot Using Improved DCT Feature and Text Tracking , 2007 .

[22]  Feng Su,et al.  Robust Seed Localization and Growing with Deep Convolutional Features for Scene Text Detection , 2015, ICMR.

[23]  Shijian Lu,et al.  Text Flow: A Unified Text Detection System in Natural Scene Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Weilin Huang,et al.  Text-Attentional Convolutional Neural Network for Scene Text Detection , 2015, IEEE Transactions on Image Processing.

[25]  Xu-Cheng Yin,et al.  Multi-strategy tracking based text detection in scene videos , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[26]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xu-Cheng Yin,et al.  Text Detection, Tracking and Recognition in Video: A Comprehensive Survey , 2016, IEEE Transactions on Image Processing.

[28]  Palaiahnakote Shivakumara,et al.  Arbitrarily-oriented multi-lingual text detection in video , 2017, Multimedia Tools and Applications.

[29]  Jon Almazán,et al.  ICDAR 2013 Robust Reading Competition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[30]  Changsong Liu,et al.  A research on Video text tracking and recognition , 2013, Electronic Imaging.

[31]  Edward J. Delp,et al.  A Low Complexity Sign Detection and Text Localization Method for Mobile Applications , 2011, IEEE Transactions on Multimedia.

[32]  Christof Koch,et al.  AdaBoost for Text Detection in Natural Scene , 2011, 2011 International Conference on Document Analysis and Recognition.

[33]  Palaiahnakote Shivakumara,et al.  New Fourier-Statistical Features in RGB Space for Video Text Detection , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  Palaiahnakote Shivakumara,et al.  A new Histogram Oriented Moments descriptor for multi-oriented moving text detection in video , 2015, Expert Syst. Appl..

[35]  Yuxiao Hu,et al.  Text From Corners: A Novel Approach to Detect Text and Caption in Videos , 2011, IEEE Transactions on Image Processing.

[36]  Kaizhu Huang,et al.  Robust Text Detection in Natural Scene Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Palaiahnakote Shivakumara,et al.  A New Technique for Multi-Oriented Scene Text Line Detection and Tracking in Video , 2015, IEEE Transactions on Multimedia.

[38]  Chunheng Wang,et al.  Text detection in images based on unsupervised classification of edge-based features , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[39]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.