Detection of the leading player in handball scenes using Mask R-CNN and STIPS

In team sports scenes, recorded during training and lessons, it is common to have many players on the court, each with his own ball performing different actions. Our goal is to detect all players in the handball court and determine the leading player who performs the given handball technique such as a shooting at the goal, catching a ball or dribbling. This is a very challenging task for which, apart from an accurate object detector that is able to deal with cluttered scenes with many objects, partially occluded and with bad illumination, additional information is needed to determine the leading player. Therefore, we propose a leading player detector method combining the Mask R-CNN object detector and spatiotemporal interest points, referred to as MR-CNN+STIPs. The performance of the proposed leading player detector is evaluated on a custom sports video dataset acquired during handball training lessons. The performance of the detector in different conditions will be discussed.

[1]  Miran Pobar,et al.  An overview of action recognition in videos , 2017, 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[4]  Alexandros Iosifidis,et al.  Person de-identification in activity videos , 2014, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[8]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[9]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[10]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[11]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Thomas B. Moeslund,et al.  Selective spatio-temporal interest points , 2012, Comput. Vis. Image Underst..

[14]  Stephen M. Smith,et al.  ASSET-2: real-time motion segmentation and shape tracking , 1995, Proceedings of IEEE International Conference on Computer Vision.

[15]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[16]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[17]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Miran Pobar,et al.  Two-tier image annotation model based on a multi-label classifier and fuzzy-knowledge representation scheme , 2016, Pattern Recognit..

[21]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[22]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[23]  Miran Pobar,et al.  Object detection in sports videos , 2018, 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).