Simultaneous Tracking and Action Recognition using the PCA-HOG Descriptor

This paper presents a template-based algorithm to track and recognize athlete’s actions in an integrated system using only visual information. Conventional template-based action recognition systems usually consider action recognition and tracking as two independent problems, and solve them separately. In contrast, our algorithm emphasizes that tracking and action recognition can be tightly coupled into a single framework, where tracking assists action recognition and vise versa. Moreover, this paper proposes to represent the athletes by the PCA-HOG descriptor, which can be computed by first transforming the athletes to the grids of Histograms of Oriented Gradient (HOG) descriptor and then project it to a linear subspace by Principal Component Analysis (PCA). The exploitation of the PCA-HOG descriptor not only helps the tracker to be robust under illumination, pose, and view-point changes, but also implicitly centers the figure in the tracking region, which makes action recognition possible. Empirical results in hockey and soccer sequences show the effectiveness of this algorithm.

[1]  Bernt Schiele,et al.  Local features for object class recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[2]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Patrick Pérez,et al.  Color-Based Probabilistic Tracking , 2002, ECCV.

[4]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[5]  Ming-Hsuan Yang,et al.  Incremental Learning for Visual Tracking , 2004, NIPS.

[6]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  D. Kriegman,et al.  Visual tracking using learned linear subspaces , 2004, CVPR 2004.

[8]  Shai Avidan,et al.  Ensemble Tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  David J. Kriegman,et al.  Visual tracking and recognition using probabilistic appearance manifolds , 2005, Comput. Vis. Image Underst..

[12]  Dariu Gavrila,et al.  A Bayesian Framework for Multi-cue 3D Object Tracking , 2004, ECCV.

[13]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  Xiaojing Wu Template-based action recognition : classifying hockey players’ movement , 2005 .

[16]  J. Little,et al.  Tracking and Recognizing Actions at a Distance , 2006 .

[17]  James J. Little,et al.  Robust Visual Tracking for Multiple Targets , 2006, ECCV.

[18]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[19]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[20]  Ying Wu,et al.  Robust Visual Tracking by Integrating Multiple Cues Based on Co-Inference Learning , 2004, International Journal of Computer Vision.

[21]  James J. Little,et al.  A Boosted Particle Filter: Multitarget Detection and Tracking , 2004, ECCV.

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  David J. Fleet,et al.  Robust Online Appearance Models for Visual Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.