A unified framework for locating and recognizing human actions

In this paper, we present a pose based approach for locating and recognizing human actions in videos. In our method, human poses are detected and represented based on deformable part model. To our knowledge, this is the first work on exploring the effectiveness of deformable part models in combining human detection and pose estimation into action recognition. Comparing with previous methods, ours have three main advantages. First, our method does not rely on any assumption on video preprocessing quality, such as satisfactory foreground segmentation or reliable tracking; Second, we propose a novel compact representation for human pose which works together with human detection and can well represent the spatial and temporal structures inside an action; Third, with human detection taken into consideration in our framework, our method has the ability to locate and recognize multiple actions in the same scene. Experiments on benchmark datasets and recorded cluttered videos verified the efficacy of our method.

[1]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Yang Wang,et al.  Learning a discriminative hidden part model for human action recognition , 2008, NIPS.

[3]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Pinar Duygulu Sahin,et al.  Histogram of oriented rectangles: A new pose descriptor for human action recognition , 2009, Image Vis. Comput..

[6]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[7]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Cristian Sminchisescu,et al.  Training Deformable Models for Localization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[10]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[12]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Fei-FeiLi,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008 .

[14]  Yang Wang,et al.  Recognizing human actions from still images with latent poses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  David A. Forsyth,et al.  Automatic Annotation of Everyday Movements , 2003, NIPS.

[16]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[17]  Benjamin Z. Yao,et al.  Learning deformable action templates from cluttered videos , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Václav Hlavác,et al.  Pose primitive based human action recognition in videos or still images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[20]  Ying Wang,et al.  Human Activity Recognition Based on R Transform , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Yihong Gong,et al.  Latent Pose Estimator for Continuous Action Recognition , 2008, ECCV.

[23]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[24]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).