Learning to Track and Identify Players from Broadcast Sports Videos

Tracking and identifying players in sports videos filmed with a single pan-tilt-zoom camera has many applications, but it is also a challenging problem. This paper introduces a system that tackles this difficult task. The system possesses the ability to detect and track multiple players, estimates the homography between video frames and the court, and identifies the players. The identification system combines three weak visual cues, and exploits both temporal and mutual exclusion constraints in a Conditional Random Field (CRF). In addition, we propose a novel Linear Programming (LP) Relaxation algorithm for predicting the best player identification in a video clip. In order to reduce the number of labeled training data required to learn the identification system, we make use of weakly supervised learning with the assistance of play-by-play texts. Experiments show promising results in tracking, homography estimation, and identification. Moreover, weakly supervised learning with play-by-play texts greatly reduces the number of labeled training examples required. The identification system can achieve similar accuracies by using merely 200 labels in weakly supervised learning, while a strongly supervised approach needs a least 20,000 labels.

[1]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Hai Tao,et al.  Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features , 2008, ECCV.

[3]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[5]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[6]  Roger Y. Tsai,et al.  A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses , 1987, IEEE J. Robotics Autom..

[7]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[8]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[9]  Jinxiang Chai,et al.  Interactive Tracking of 2D Generic Objects with Spacetime Optimization , 2008, ECCV.

[10]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[11]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[12]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[13]  Chung-Lin Huang,et al.  Semantic analysis of soccer video using dynamic Bayesian network , 2006, IEEE Transactions on Multimedia.

[14]  Brendan J. Frey,et al.  A Revolution: Belief Propagation in Graphs with Cycles , 1997, NIPS.

[15]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[16]  Shervin Mohammadi Tari,et al.  Automatic initialization for broadcast sports videos rectification , 2011 .

[17]  David G. Lowe,et al.  Shape Descriptors for Maximally Stable Extremal Regions , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[20]  Jiawei Han,et al.  Semi-supervised Discriminant Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Jia Liu,et al.  Automatic Player Detection, Labeling and Tracking in Broadcast Soccer Video , 2007, BMVC.

[22]  Ankur Agarwal,et al.  Tracking Articulated Motion Using a Mixture of Autoregressive Models , 2004, ECCV.

[23]  Deva Ramanan,et al.  Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces , 2010, ECCV.

[24]  Patrick Pérez,et al.  Color-Based Probabilistic Tracking , 2002, ECCV.

[25]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[26]  Jean Ponce,et al.  Automatic annotation of human actions in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Chiou-Ting Hsu,et al.  Fusion of audio and motion information on HMM-based highlight extraction for baseball games , 2006, IEEE Transactions on Multimedia.

[28]  Ramakant Nevatia,et al.  Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[29]  B. Taskar,et al.  Learning from ambiguously labeled images , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Wolfgang Förstner,et al.  Detecting interpretable and accurate scale-invariant keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Min-Chun Hu,et al.  Robust Camera Calibration and Player Tracking in Broadcast Basketball Video , 2011, IEEE Transactions on Multimedia.

[32]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[33]  Andrew Zisserman,et al.  “Who are you?” - Learning person specific classifiers from video , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Alberto Del Bimbo,et al.  Soccer players identification based on visual local features , 2007, CIVR '07.

[35]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[37]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[38]  Ramakant Nevatia,et al.  Learning affinities and dependencies for multi-target tracking using a CRF model , 2011, CVPR 2011.

[39]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[40]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[41]  Noel E. O'Connor,et al.  Event detection in field sports video using audio-visual features and a support vector Machine , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[42]  Michael Beetz,et al.  Camera-based observation of football games for analyzing multi-agent activities , 2006, AAMAS '06.

[43]  Pascal Fua,et al.  Multicamera People Tracking with a Probabilistic Occupancy Map , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Ben Taskar,et al.  Talking pictures: Temporal grouping and dialog-supervised person recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[46]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[47]  Hua-Tsung Chen,et al.  Physics-based ball tracking and 3D trajectory reconstruction with applications to shooting location estimation in basketball video , 2009, J. Vis. Commun. Image Represent..

[48]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[49]  Ingemar J. Cox,et al.  A review of statistical data association techniques for motion correspondence , 1993, International Journal of Computer Vision.

[50]  Andrew Zisserman,et al.  Automatic face recognition for film character retrieval in feature-length films , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[51]  Kenji Okuma,et al.  Active exploration of training data for improved object detection , 2012 .

[52]  Zhengyou Zhang,et al.  Iterative point matching for registration of free-form curves and surfaces , 1994, International Journal of Computer Vision.

[53]  Neil A. Thacker,et al.  Real-time Body Tracking Using a Gaussian Process Latent Variable Model , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[54]  Ramakant Nevatia,et al.  Learning to associate: HybridBoosted multi-target tracker for crowded scene , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  James J. Little,et al.  Optimizing Multiple Object Tracking and Best View Video Synthesis , 2008, IEEE Transactions on Multimedia.

[56]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[57]  Barbara Caputo,et al.  Who's Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation , 2009, NIPS.

[58]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[59]  Songhwai Oh,et al.  Markov chain Monte Carlo data association for general multiple-target tracking problems , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[60]  James J. Little,et al.  Robust Visual Tracking for Multiple Targets , 2006, ECCV.

[61]  有木 康雄,et al.  Deformable Part Modelを用いた顔部品検出 , 2015 .

[62]  Andrew Zisserman,et al.  Learning sign language by watching TV (using weakly aligned subtitles) , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[64]  Chia-Ling Tsai,et al.  The dual-bootstrap iterative closest point algorithm with application to retinal image registration , 2003, IEEE Transactions on Medical Imaging.

[65]  Gang Qian,et al.  Monocular 3D Tracking of Articulated Human Motion in Silhouette and Pose Manifolds , 2008, EURASIP J. Image Video Process..

[66]  Pascal Fua,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Multiple Object Tracking Using K-shortest Paths Optimization , 2022 .

[67]  James J. Little,et al.  A Boosted Particle Filter: Multitarget Detection and Tracking , 2004, ECCV.

[68]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[69]  H. Opower Multiple view geometry in computer vision , 2002 .

[70]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[71]  Alberto Del Bimbo,et al.  Player identification in soccer videos , 2005, MIR '05.

[72]  Erica Klarreich,et al.  Hello, my name is… , 2014, CACM.

[73]  Ben Taskar,et al.  Learning from Partial Labels , 2011, J. Mach. Learn. Res..

[74]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[75]  Ramakant Nevatia,et al.  Multi-target tracking by on-line learned discriminative appearance models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[76]  Mark W. Schmidt,et al.  Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm , 2009, AISTATS.

[77]  Alan Fern,et al.  Discriminatively trained particle filters for complex multi-object tracking , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[78]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[79]  Maria-Florina Balcan,et al.  Person Identification in Webcam Images: An Application of Semi-Supervised Learning , 2005 .

[80]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[82]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[83]  A. Murat Tekalp,et al.  Automatic soccer video analysis and summarization , 2003, IEEE Trans. Image Process..

[84]  Wen Gao,et al.  Jersey number detection in sports video for athlete identification , 2005, Visual Communications and Image Processing.

[85]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[86]  A. Karimi,et al.  Master‟s thesis , 2011 .

[87]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[88]  Robert T. Collins,et al.  Multi-target Data Association by Tracklets with Unsupervised Parameter Estimation , 2008, BMVC.

[89]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[90]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[91]  Ramakant Nevatia,et al.  Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors , 2007, International Journal of Computer Vision.

[92]  Harry Shum,et al.  Interactive Offline Tracking for Color Objects , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[93]  Alan Fern,et al.  Improved Video Registration using Non-Distinctive Local Image Features , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[94]  James J. Little,et al.  Tracking and recognizing actions of multiple hockey players using the boosted particle filter , 2009, Image Vis. Comput..

[95]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[96]  James J. Little,et al.  Using Line and Ellipse Features for Rectification of Broadcast Hockey Video , 2011, 2011 Canadian Conference on Computer and Robot Vision.

[97]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[98]  Tiziana D'Orazio,et al.  A review of vision-based systems for soccer video analysis , 2010, Pattern Recognit..

[99]  Pascal Fua,et al.  Tracking multiple people under global appearance constraints , 2011, 2011 International Conference on Computer Vision.

[100]  Alessandro Perina,et al.  Person re-identification by symmetry-driven accumulation of local features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[101]  Antonio Torralba,et al.  LabelMe video: Building a video database with human annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[102]  Mubarak Shah,et al.  Modeling inter-camera space-time and appearance relationships for tracking across non-overlapping views , 2008, Comput. Vis. Image Underst..

[103]  James J. Little,et al.  Identifying players in broadcast sports videos using conditional random fields , 2011, CVPR 2011.

[104]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[105]  Larry S. Davis,et al.  Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[106]  James J. Little,et al.  AUTOMATIC RECTIFICATION OF LONG IMAGE SEQUENCES , 2003 .

[107]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[108]  Hrvoje Dujmić,et al.  Player Number Localization and Recognition in Soccer Video using HSV Color Space and Internal Contours , 2008 .

[109]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[110]  Bi Song,et al.  A Stochastic Graph Evolution Framework for Robust Multi-target Tracking , 2010, ECCV.

[111]  James Orwell,et al.  Occlusion analysis: Learning and utilising depth maps in object tracking , 2008, Image Vis. Comput..

[112]  Wen Gao,et al.  Event Tactic Analysis Based on Broadcast Sports Video , 2009, IEEE Trans. Multim..

[113]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[114]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.