论文信息 - Discriminative figure-centric models for joint action localization and recognition

Discriminative figure-centric models for joint action localization and recognition

In this paper we develop an algorithm for action recognition and localization in videos. The algorithm uses a figure-centric visual word representation. Different from previous approaches it does not require reliable human detection and tracking as input. Instead, the person location is treated as a latent variable that is inferred simultaneously with action recognition. A spatial model for an action is learned in a discriminative fashion under a figure-centric representation. Temporal smoothness over video sequences is also enforced. We present results on the UCF-Sports dataset, verifying the effectiveness of our model in situations where detection and tracking of individuals is challenging.

[1] Jitendra Malik,et al. Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2] David B. Cooper,et al. Accurately Estimating Sherd 3D Surface Geometry with Application to Pot Reconstruction , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[3] Eli Shechtman,et al. Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4] Pietro Perona,et al. Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[5] Serge J. Belongie,et al. Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[6] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7] Bernt Schiele,et al. Integrating representative and discriminant models for object category detection , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8] Juan Carlos Niebles,et al. Unsupervised Learning of Human Action Categories , 2006 .

[9] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Mubarak Shah,et al. Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[12] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Thorsten Joachims,et al. Learning structural SVMs with latent variables , 2009, ICML '09.

[14] Lior Wolf,et al. Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15] James J. Little,et al. Tracking and recognizing actions of multiple hockey players using the boosted particle filter , 2009, Image Vis. Comput..

[16] Thierry Artières,et al. Large margin training for hidden Markov models with partially observed states , 2009, ICML '09.

[17] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[18] Jake K. Aggarwal,et al. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19] Cordelia Schmid,et al. Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[20] Cordelia Schmid,et al. Actions in context , 2009, CVPR.

[21] Jitendra Malik,et al. Object detection using a max-margin Hough transform , 2009, CVPR.

[22] Alexander Klaser,et al. Learning human actions in video , 2010 .

[23] Thomas Deselaers,et al. Localizing Objects While Learning Their Appearance , 2010, ECCV.

[24] Adriana Kovashka,et al. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25] Rémi Ronfard,et al. A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[26] Daphne Koller,et al. A segmentation-aware object detection model with occlusion handling , 2011, CVPR 2011.