论文信息 - A Hough transform-based voting framework for action recognition

A Hough transform-based voting framework for action recognition

We present a method to classify and localize human actions in video using a Hough transform voting framework. Random trees are trained to learn a mapping between densely-sampled feature patches and their corresponding votes in a spatio-temporal-action Hough space. The leaves of the trees form a discriminative multi-class codebook that share features between the action classes and vote for action centers in a probabilistic manner. Using low-level features such as gradients and optical flow, we demonstrate that Hough-voting can achieve state-of-the-art performance on several datasets covering a wide range of action-recognition scenarios.

[1] Dana H. Ballard,et al. Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..

[2] Editors , 1986, Brain Research Bulletin.

[3] Neil J. Gordon,et al. Editors: Sequential Monte Carlo Methods in Practice , 2001 .

[4] Nando de Freitas,et al. Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[5] James W. Davis,et al. The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[6] Matti Pietikäinen,et al. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7] Jitendra Malik,et al. Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9] Mubarak Shah,et al. View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[10] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[11] Thomas Brox,et al. High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[12] B. Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[13] Serge J. Belongie,et al. Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[14] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15] Juan Carlos Niebles,et al. Unsupervised Learning of Human Action Categories , 2006 .

[16] Thomas Serre,et al. A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17] Bernt Schiele,et al. Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[18] Ronen Basri,et al. Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Antonio Torralba,et al. Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Mubarak Shah,et al. Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Krystian Mikolajczyk,et al. Action recognition with motion-appearance vocabulary forest , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Luc Van Gool,et al. Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Andrew Zisserman,et al. Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection , 2008, International Journal of Computer Vision.

[25] Cordelia Schmid,et al. Viewpoint-independent object class detection using 3D Feature Maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Maja Pantic,et al. An implicit spatiotemporal shape model for human activity localization and recognition , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[27] Larry S. Davis,et al. Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28] Juergen Gall,et al. Class-specific Hough forests for object detection , 2009, CVPR.

[29] Lior Wolf,et al. Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30] S. Kollias,et al. Dense saliency-based spatiotemporal feature points for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Luc Van Gool,et al. Exemplar-based Action Recognition in Video , 2009, BMVC.

[32] Mubarak Shah,et al. Incremental action recognition using feature-tree , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33] Jake K. Aggarwal,et al. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34] Cordelia Schmid,et al. Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[35] Jiebo Luo,et al. Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[36] Jitendra Malik,et al. Multi-scale object detection by clustering lines , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37] Jitendra Malik,et al. Object detection using a max-margin Hough transform , 2009, CVPR.