论文信息 - Pose Filter Based Hidden-CRF Models for Activity Detection

Pose Filter Based Hidden-CRF Models for Activity Detection

Detecting activities which involve a sequence of complex pose and motion changes in unsegmented videos is a challenging task, and common approaches use sequential graphical models to infer the human pose-state in every frame. We propose an alternative model based on detecting the key-poses in a video, where only the temporal positions of a few key-poses are inferred. We also introduce a novel pose summarization algorithm to automatically discover the key-poses of an activity. We learn a detection filter for each key-pose, which along with a bag-of-words root filter are combined in an HCRF model, whose parameters are learned using the latent-SVM optimization. We evaluate the performance of our model for detection on unsegmented videos on four human action datasets, which include challenging crowded scenes with dynamic backgrounds, inter-person occlusions, multi-human interactions and hard-to-detect daily use objects.

Ramakant Nevatia | Prithviraj Banerjee

[1] John R. Kender,et al. Computational approaches to temporal sampling of video sequences , 2007, TOMCCAP.

[2] Selim Aksoy,et al. Recognizing Patterns in Signals, Speech, Images and Videos , 2010, Lecture Notes in Computer Science.

[3] Ramakant Nevatia,et al. Learning 3D action models from a few 2D videos for view invariant action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4] Thorsten Joachims,et al. Learning structural SVMs with latent variables , 2009, ICML '09.

[5] Martial Hebert,et al. Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6] Ramakant Nevatia,et al. Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[7] Ramakant Nevatia,et al. Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Sven J. Dickinson,et al. Recognize Human Activities from Partially Observed Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Jake K. Aggarwal,et al. An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010 , 2010, ICPR Contests.

[10] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Martial Hebert,et al. Modeling the Temporal Extent of Actions , 2010, ECCV.

[12] Martial Hebert,et al. Volumetric Features for Video Event Detection , 2010, International Journal of Computer Vision.

[13] Yang Wang,et al. Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Stefano Soatto,et al. Tracklet Descriptors for Action Modeling and Video Analysis , 2010, ECCV.

[15] Tsuhan Chen,et al. Spatio-Temporal Phrases for Activity Recognition , 2012, ECCV.

[16] Christopher Joseph Pal,et al. Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17] Michael S. Ryoo,et al. Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[18] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Bo Gao,et al. A discriminative key pose sequence model for recognizing human interactions , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[20] Yueting Zhuang,et al. Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[21] Juan Carlos Niebles,et al. Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[22] Yunde Jia,et al. Learning Human Interaction by Interactive Phrases , 2012, ECCV.

[23] Mubarak Shah,et al. Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Ramakant Nevatia,et al. Action recognition in cluttered dynamic scenes using Pose-Specific Part Models , 2011, 2011 International Conference on Computer Vision.

[25] Larry S. Davis,et al. Representing Videos Using Mid-level Discriminative Patches , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Matthieu Guillaumin,et al. Segmentation Propagation in ImageNet , 2012, ECCV.

[27] Thomas Deselaers,et al. ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[28] Andrew J. Davison,et al. Active Matching , 2008, ECCV.

[29] Leonid Sigal,et al. Poselet Key-Framing: A Model for Human Activity Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Luc Van Gool,et al. Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Ying Wu,et al. Action recognition with multiscale spatio-temporal contexts , 2011, CVPR 2011.

[32] Eli Shechtman,et al. Space-Time Behavior-Based Correlation-OR-How to Tell If Two Underlying Motion Fields Are Similar Without Computing Them? , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Ying Wu,et al. Discriminative subvolume search for efficient action detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Cordelia Schmid,et al. Actom sequence models for efficient action detection , 2011, CVPR 2011.

[35] Mubarak Shah,et al. Spatiotemporal Deformable Part Models for Action Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.