Group Activity Detection from Trajectory and Video Data in Soccer

Group activity detection in soccer can be done by using either video data or player and ball trajectory data. In current soccer activity datasets, activities are labelled as atomic events without a duration. Given that the state-of-the-art activity detection methods are not well-defined for atomic actions, these methods cannot be used. In this work, we evaluated the effectiveness of activity recognition models for detecting such events, by using an intuitive non-maximum suppression process and evaluation metrics. We also considered the problem of explicitly modeling interactions between players and ball. For this, we propose self-attention models to learn and extract relevant information from a group of soccer players for activity detection from both trajectory and video data. We conducted an extensive study on the use of visual features and trajectory data for group activity detection in sports using a large scale soccer dataset provided by Sportlogiq. Our results show that most events can be detected using either vision or trajectory-based approaches with a temporal resolution of less than 0.5 seconds, and that each approach has unique challenges.

[1]  Kate Saenko,et al.  R-C3D: Region Convolutional 3D Network for Temporal Activity Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Ahmad Nickabadi,et al.  Convolutional Relational Machine for Group Activity Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Lei Chen,et al.  Deep Structured Models For Group Activity Recognition , 2015, BMVC.

[5]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[6]  Oliver Schulte,et al.  Apples-to-Apples : Clustering and Ranking NHL Players Using Location Information and Scoring Impact , 2017 .

[7]  Andrew Zisserman,et al.  Video Action Transformer Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  David A. Clausi,et al.  Hockey Action Recognition via Integrated Stacked Hourglass Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Yutaka Satoh,et al.  Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Iain Matthews,et al.  "Quality vs Quantity": Improved Shot Prediction in Soccer using Strategic Features from Spatiotemporal Data , 2015 .

[11]  Michael S. Ryoo,et al.  Fine-Grained Activity Recognition in Baseball Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Luke Bornn,et al.  Time Perception Machine: Temporal Point Processes for the When, Where and What of Activity Prediction , 2018, ArXiv.

[13]  Cordelia Schmid,et al.  AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Yaser Sheikh,et al.  Representing and Discovering Adversarial Team Behaviors Using Player Roles , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Cees G. M. Snoek,et al.  Actor-Transformers for Group Activity Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Luke Bornn,et al.  Deep Learning of Player Trajectory Representations for Team Activity Analysis , 2018 .

[18]  Li Wang,et al.  Learning Actor Relation Graphs for Group Activity Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  DarrellTrevor,et al.  Long-Term Recurrent Convolutional Networks for Visual Recognition and Description , 2017 .

[20]  Ruslan Salakhutdinov,et al.  Action Recognition using Visual Attention , 2015, NIPS 2015.

[21]  Lei Zhang,et al.  AutoLoc: Weakly-supervised Temporal Action Localization , 2018, ECCV.

[22]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[23]  Paul Lukowicz,et al.  Performance metrics for activity recognition , 2011, TIST.

[24]  David A. Clausi,et al.  Temporal Hockey Action Recognition via Pose and Optical Flows , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  James J. Little,et al.  Classification of Puck Possession Events in Ice Hockey , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Amit K. Roy-Chowdhury,et al.  W-TALC: Weakly-supervised Temporal Activity Localization and Classification , 2018, ECCV.

[27]  Abhinav Gupta,et al.  Videos as Space-Time Region Graphs , 2018, ECCV.

[28]  Oliver Schulte,et al.  A Markov Game model for valuing actions, locations, and team performance in ice hockey , 2017, Data Mining and Knowledge Discovery.

[29]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Luc Van Gool,et al.  stagNet: An Attentive Semantic RNN for Group Activity Recognition , 2018, ECCV.

[31]  Alexander G. Schwing,et al.  Diverse Generation for Multi-Agent Sports Games , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Greg Mori,et al.  A Hierarchical Deep Temporal Model for Group Activity Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[35]  Rahul Sukthankar,et al.  Rethinking the Faster R-CNN Architecture for Temporal Action Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Yisong Yue,et al.  Data-Driven Ghosting using Deep Imitation Learning , 2017 .

[37]  Patrick Lucey,et al.  Where Will They Go? Predicting Fine-Grained Adversarial Multi-agent Motion Using Conditional Variational Autoencoders , 2018, ECCV.

[38]  Silvio Savarese,et al.  What are they doing? : Collective activity classification using spatio-temporal relationship among people , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[39]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  R. Zemel,et al.  Classifying NBA Offensive Plays Using Neural Networks , 2016 .

[41]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).