A New Unified Method for Detecting Text from Marathon Runners and Sports Players in Video

Abstract Detecting text located on the torsos of marathon runners and sports players in video is a challenging issue due to poor quality and adverse effects caused by flexible/colorful clothing, and different structures of human bodies or actions. This paper presents a new unified method for tackling the above challenges. The proposed method fuses gradient magnitude and direction coherence of text pixels in a new way for detecting candidate regions. Candidate regions are used for determining the number of temporal frame clusters obtained by K-means clustering on frame differences. This process in turn detects key frames. The proposed method explores Bayesian probability for skin portions using color values at both pixel and component levels of temporal frames, which provides fused images with skin components. Based on skin information, the proposed method then detects faces and torsos by finding structural and spatial coherences between them. We further propose adaptive pixels linking a deep learning model for text detection from torso regions. The proposed method is tested on our own dataset collected from marathon/sports video and three standard datasets, namely, RBNR, MMM and R-ID of marathon images, to evaluate the performance. In addition, the proposed method is also tested on the standard natural scene datasets, namely, CTW1500 and MS-COCO text datasets, to show the objectiveness of the proposed method. A comparative study with the state-of-the-art methods on bib number/text detection of different datasets shows that the proposed method outperforms the existing methods.

[1]  Jiri Matas,et al.  Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Lianwen Jin,et al.  Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xin He,et al.  TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes , 2018, ECCV.

[4]  Weiqiang Wang,et al.  Video text detection with text edges and convolutional neural network , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[5]  Xu-Cheng Yin,et al.  Multi-strategy tracking based text detection in scene videos , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[6]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Palaiahnakote Shivakumara,et al.  A new multi-modal approach to bib number/text detection and recognition in Marathon images , 2017, Pattern Recognit..

[9]  Gui-Song Xia,et al.  Rotation-Sensitive Regression for Oriented Scene Text Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Shai Avidan,et al.  Racing Bib Number Recognition , 2012 .

[11]  Changick Kim,et al.  Blurred image region detection and segmentation , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[12]  Bugao Xu,et al.  Automatic segmenting and measurement on scanned human body , 2006 .

[13]  Yi-Qing Wang,et al.  An Analysis of the Viola-Jones Face Detection Algorithm , 2014, Image Process. Line.

[14]  Chun Yang,et al.  Tracking Based Multi-Orientation Scene Text Detection: A Unified Framework With Dynamic Programming , 2017, IEEE Transactions on Image Processing.

[15]  Lianwen Jin,et al.  Detecting Curve Text in the Wild: New Dataset and New Solution , 2017, ArXiv.

[16]  C. V. Jawahar,et al.  Fine-grain annotation of cricket videos , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[17]  Shijian Lu,et al.  Multioriented Video Scene Text Detection Through Bayesian Classification and Boundary Growing , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Shijian Lu,et al.  WeText: Scene Text Detection under Weak Supervision , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Hideki Sumiyoshi,et al.  Scene-Text-Detection Method Robust Against Orientation and Discontiguous Components of Characters , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Xuelong Li,et al.  PixelLink: Detecting Scene Text via Instance Segmentation , 2018, AAAI.

[21]  Weilin Huang,et al.  Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network , 2016, ArXiv.

[22]  Rama Chellappa,et al.  Skin Detection -a Short Tutorial , 2010 .

[23]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[24]  Stefano Ghidoni,et al.  Ensemble of convolutional neural networks for bioimage classification , 2020, Applied Computing and Informatics.

[25]  Wenyu Liu,et al.  Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Palaiahnakote Shivakumara,et al.  A New Technique for Multi-Oriented Scene Text Line Detection and Tracking in Video , 2015, IEEE Transactions on Multimedia.

[27]  Pei Xu,et al.  Person Re-identification with End-to-End Scene Text Recognition , 2017, CCCV.

[28]  Tianzhu Zhang,et al.  Context-aware learning for automatic sports highlight recognition , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[29]  Dongyoon Han,et al.  Character Region Awareness for Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Partha Pratim Roy,et al.  Rotation and script independent text detection from video frames using sub pixel mapping , 2017, J. Vis. Commun. Image Represent..

[31]  Jiri Matas,et al.  COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images , 2016, ArXiv.

[32]  Anil K. Jain,et al.  A Case Study of Automated Face Recognition: The Boston Marathon Bombings Suspects , 2013, Computer.

[33]  Christoph Meinel,et al.  SEE: Towards Semi-Supervised End-to-End Scene Text Recognition , 2017, AAAI.

[34]  Ya Su,et al.  A Unified Framework for Tracking Based Text Detection and Recognition from Web Videos , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Shuchang Zhou,et al.  Scene Text Detection via Holistic, Multi-Channel Prediction , 2016, ArXiv.

[36]  Palaiahnakote Shivakumara,et al.  Fractals based multi-oriented text detection system for recognition in mobile video images , 2017, Pattern Recognit..