Self-Supervised Small Soccer Player Detection and Tracking

In a soccer game, the information provided by detecting and tracking brings crucial clues to further analyze and understand some tactical aspects of the game, including individual and team actions. State-of-the-art tracking algorithms achieve impressive results in scenarios on which they have been trained for, but they fail in challenging ones such as soccer games. This is frequently due to the player small relative size and the similar appearance among players of the same team. Although a straightforward solution would be to retrain these models by using a more specific dataset, the lack of such publicly available annotated datasets entails searching for other effective solutions. In this work, we propose a self-supervised pipeline which is able to detect and track low-resolution soccer players under different recording conditions without any need of ground-truth data. Extensive quantitative and qualitative experimental results are presented evaluating its performance. We also present a comparison to several state-of-the-art methods showing that both the proposed detector and the proposed tracker achieve top-tier results, in particular in the presence of small players. Code available at "https://github.com/samuro95/Self-Supervised-Small-Soccer-Player-Detection-Tracking".

[1]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[2]  Yichen Wei,et al.  Simple Baselines for Human Pose Estimation and Tracking , 2018, ECCV.

[3]  Yunhong Wang,et al.  A Robust Multi-Athlete Tracking Algorithm by Exploiting Discriminant Features and Long-Term Dependencies , 2018, MMM.

[4]  Jitendra Malik,et al.  Contextual Action Recognition with R*CNN , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[6]  Guanghan Ning,et al.  LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7]  James J. Little,et al.  Sports Camera Calibration via Synthetic Data , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Tiziana D'Orazio,et al.  A Semi-automatic System for Ground Truth Generation of Soccer Video Sequences , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[9]  Carsten Griwodz,et al.  Soccer video and player position dataset , 2014, MMSys '14.

[10]  Larry S. Davis,et al.  Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Yunchao Wei,et al.  Perceptual Generative Adversarial Networks for Small Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Kyunghyun Cho,et al.  Augmentation for small object detection , 2019, 9th International Conference on Advances in Computing and Information Technology (ACITY 2019).

[13]  Tiziana D'Orazio,et al.  An Investigation Into the Feasibility of Real-Time Soccer Offside Detection From a Multiple Camera System , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Maneesh Singh,et al.  Progressive Domain Adaptation for Object Detection , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15]  Grzegorz Kurzejamski,et al.  FootAndBall: Integrated player and ball detector , 2020, VISIGRAPP.

[16]  Huang-Chia Shih,et al.  A Survey of Content-Aware Video Analysis for Sports , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Shunping Xiao,et al.  Small Object Detection in Optical Remote Sensing Images via Modified Faster R-CNN , 2018 .

[18]  Davide Modolo,et al.  Multi-Object Tracking with Siamese Track-RCNN , 2020, ArXiv.

[19]  Laura Leal-Taixé,et al.  Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Lorenzo Torresani,et al.  Detect-and-Track: Efficient Pose Estimation in Videos , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[23]  Sergio L. Netto,et al.  A Survey on Performance Metrics for Object-Detection Algorithms , 2020, 2020 International Conference on Systems, Signals and Image Processing (IWSSIP).

[24]  Laura Leal-Taix'e,et al.  Learning a Neural Solver for Multiple Object Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Bir Bhanu,et al.  Soccer: Who Has the Ball? Generating Visual Analytics and Player Statistics , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Hamid Abrishami Moghaddam,et al.  A survey on player tracking in soccer videos , 2017, Comput. Vis. Image Underst..

[28]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[29]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Bernard Ghanem,et al.  SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  Konrad Schindler,et al.  Learning by Tracking: Siamese CNN for Robust Target Association , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32]  Daniel Cremers,et al.  Tracking the Trackers: An Analysis of the State of the Art in Multiple Object Tracking , 2017, ArXiv.

[33]  Kiyoharu Aizawa,et al.  Cross-Domain Weakly-Supervised Object Detection Through Progressive Domain Adaptation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[35]  Junqing Yu,et al.  Comprehensive Dataset of Broadcast Soccer Videos , 2018, 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[36]  Francisco Herrera,et al.  Deep Learning in Video Multi-Object Tracking: A Survey , 2019, Neurocomputing.

[37]  Andrew Zisserman,et al.  Detect to Track and Track to Detect , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  James J. Little,et al.  Light Cascaded Convolutional Neural Networks for Accurate Player Detection , 2017, BMVC.

[39]  Jianfeng Xu,et al.  Fast and Accurate Object Detection Using Image Cropping/Resizing in Multi-View 4K Sports Videos , 2018, MMSports@MM.

[40]  Zdravko Ivankovic,et al.  Automatic player position detection in basketball games , 2013, Multimedia Tools and Applications.

[41]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[42]  Liang Zheng,et al.  Towards Real-Time Multi-Object Tracking , 2020, ECCV.

[43]  Jing Zhang,et al.  Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Hamid Abrishami Moghaddam,et al.  Multi-player detection in soccer broadcast videos using a blob-guided particle swarm optimization method , 2017, Multimedia Tools and Applications.

[45]  Liangliang Cao,et al.  Automatic Adaptation of Object Detectors to New Domains Using Self-Training , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Ramakant Nevatia,et al.  An online learned CRF model for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Marc Van Droogenbroeck,et al.  ARTHuS: Adaptive Real-Time Human Segmentation in Sports Through Online Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[49]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[50]  Slawomir Mackowiak Segmentation of Football Video Broadcast , 2013 .

[51]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Xinggang Wang,et al.  A Simple Baseline for Multi-Object Tracking , 2020, ArXiv.

[53]  Miran Pobar,et al.  Object detection in sports videos , 2018, 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).