What will Happen Next? Forecasting Player Moves in Sports Videos

A large number of very popular team sports involve the act of one team trying to score a goal against the other. During this game play, defending players constantly try to predict the next move of the attackers to prevent them from scoring, whereas attackers constantly try to predict the next move of the defenders in order to defy them and score. Such behavior is a prime example of the general human faculty to make predictions about the future and is an important facet of human intelligence. An algorithmic solution to learning a model of the external world from sensory inputs in order to make forecasts is an important unsolved problem. In this work we develop a generic framework for forecasting future events in team sports videos directly from visual inputs. We introduce water polo and basketball datasets towards this end and compare the predictions of the proposed methods against expert and non-expert humans.

[1]  Sridha Sridharan,et al.  Forecasting Events Using an Augmented Hidden Conditional Random Field , 2014, ACCV.

[2]  Sridha Sridharan,et al.  Predicting Shot Locations in Tennis Using Spatiotemporal Data , 2013, 2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[3]  Dariu Gavrila,et al.  Context-Based Pedestrian Path Prediction , 2014, ECCV.

[4]  Pascal Fua,et al.  Tracking multiple people under global appearance constraints , 2011, 2011 International Conference on Computer Vision.

[5]  Andrea Cavallaro,et al.  Video-Based Human Behavior Understanding: A Survey , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Silvio Savarese,et al.  A Hierarchical Representation for Future Action Prediction , 2014, ECCV.

[7]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[8]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Deva Ramanan,et al.  Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces , 2010, ECCV.

[10]  Hema Swetha Koppula,et al.  Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Antonio Torralba,et al.  Anticipating the future by watching unlabeled video , 2015, ArXiv.

[12]  Li Fei-Fei,et al.  Detecting Events and Key Actors in Multi-person Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[15]  Martial Hebert,et al.  Patch to the Future: Unsupervised Visual Prediction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Li Fei-Fei,et al.  Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos , 2015, International Journal of Computer Vision.

[17]  Jitendra Malik,et al.  Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[18]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[19]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[20]  Michael Beetz,et al.  ASPOGAMO: Automated Sports Games Analysis Models , 2009, Int. J. Comput. Sci. Sport.

[21]  Pascal Fua,et al.  What Players do with the Ball: A Physically Constrained Interaction Modeling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[23]  Ali Farhadi,et al.  Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Kris M. Kitani,et al.  Action-Reaction: Forecasting the Dynamics of Human Interaction , 2014, ECCV.

[25]  Martial Hebert,et al.  An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders , 2016, ECCV.

[26]  Yaser Sheikh,et al.  Representing and Discovering Adversarial Team Behaviors Using Player Roles , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  David F. Fouhey,et al.  Predicting Object Dynamics in Scenes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Shiuh-Ku Weng,et al.  Video object tracking using adaptive Kalman filter , 2006, J. Vis. Commun. Image Represent..

[29]  Hema Swetha Koppula,et al.  Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[31]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[32]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[34]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[35]  Eike Rehder,et al.  Goal-Directed Pedestrian Prediction , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[36]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[37]  Wolfram Burgard,et al.  Learning to predict trajectories of cooperatively navigating agents , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[39]  Tamara L. Berg,et al.  Temporal Perception and Prediction in Ego-Centric Video , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Sridha Sridharan,et al.  Large-Scale Analysis of Soccer Matches Using Spatiotemporal Tracking Data , 2014, 2014 IEEE International Conference on Data Mining.

[41]  Marc'Aurelio Ranzato,et al.  Video (language) modeling: a baseline for generative models of natural videos , 2014, ArXiv.

[42]  Stefano Soatto,et al.  Intent-aware long-term prediction of pedestrian motion , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.