论文信息 - Discovering discriminative action parts from mid-level video representations

Discovering discriminative action parts from mid-level video representations

We describe a mid-level approach for action recognition. From an input video, we extract salient spatio-temporal structures by forming clusters of trajectories that serve as candidates for the parts of an action. The assembly of these clusters into an action class is governed by a graphical model that incorporates appearance and motion constraints for the individual parts and pairwise constraints for the spatio-temporal dependencies among them. During training, we estimate the model parameters discriminatively. During classification, we efficiently match the model to a video using discrete optimization. We validate the model's classification ability in standard benchmark datasets and illustrate its potential to support a fine-grained analysis that not only gives a label to a video, but also identifies and localizes its constituent parts.

[1] R. Cattell. The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[2] Pietro Perona,et al. Unsupervised Learning of Models for Recognition , 2000, ECCV.

[3] James W. Davis,et al. The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[4] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.

[5] Christoph Schnörr,et al. Subgraph Matching with Semidefinite Programming , 2003, Electron. Notes Discret. Math..

[6] Alan L. Yuille,et al. The Concave-Convex Procedure , 2003, Neural Computation.

[7] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[8] Martial Hebert,et al. A spectral technique for correspondence problems using pairwise constraints , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9] Jianbo Shi,et al. Balanced Graph Matching , 2006, NIPS.

[10] Vladimir Kolmogorov,et al. Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Ronen Basri,et al. Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Martial Hebert,et al. Event Detection in Crowded Videos , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13] David A. Forsyth,et al. Searching Video for Complex Activities with Finite State Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Mubarak Shah,et al. Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Stefano Soatto,et al. Localizing Objects with Smart Dictionaries , 2008, ECCV.

[17] Cordelia Schmid,et al. A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[18] Vladimir Kolmogorov,et al. Feature Correspondence Via Graph Matching: Models and Global Optimization , 2008, ECCV.

[19] Lior Wolf,et al. Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20] Martial Hebert,et al. Trajectons: Action recognition through the motion analysis of tracked features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[21] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Jintao Li,et al. Hierarchical spatio-temporal context modeling for action recognition , 2009, CVPR.

[23] Matthew B. Blaschko,et al. Simultaneous Object Detection and Ranking with Weak Supervision , 2010, NIPS.

[24] Jitendra Malik,et al. Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[25] Juan Carlos Niebles,et al. Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[26] James M. Rehg,et al. Temporal causality for the analysis of visual events , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27] William Brendel,et al. Activities as Time Series of Human Postures , 2010, ECCV.

[28] Stefano Soatto,et al. Tracklet Descriptors for Action Modeling and Video Analysis , 2010, ECCV.

[29] Yang Wang,et al. Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.

[31] Yang Wang,et al. Discriminative figure-centric models for joint action localization and recognition , 2011, 2011 International Conference on Computer Vision.

[32] Michael G. Rabbat,et al. GANC: Greedy Agglomerative Normalized Cut , 2011, ArXiv.

[33] Mubarak Shah,et al. Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories , 2011, 2011 International Conference on Computer Vision.

[34] William Brendel,et al. Learning spatiotemporal graphs of human activities , 2011, 2011 International Conference on Computer Vision.

[35] Andrew Zisserman,et al. Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..