Dense body part trajectories for human action recognition

Several techniques have been proposed for human action recognition from videos. It has been observed that incorporating mid-level viz. human body and/or high-level information viz. pose estimation in the computation of low-level features viz. trajectories yields the best performance in action recognition where full body is presumed. However, in datasets with a large number of classes, where the full body may not be visible at all times, incorporating such mid- and high-level information is unexplored. Moreover, changes and developments in any stage will require a recompute of all low-level features. We decouple mid-level and low-level feature computation and study on benchmark action recognition datasets such as UCF50, UCF101 and HMDB51, containing the largest number of action classes to date. Further, we employ a part-based model for human body part detection in frames statically, thus also investigating classes where the full body is not present. We also track dense regions around the detected human body parts by Hungarian particle linking, thus minimising most of the wrongly detected body parts and enriching the mid-level information.

[1]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[2]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[3]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Ioannis A. Kakadiaris,et al.  Modeling Motion of Body Parts for Action Recognition , 2011, BMVC.

[5]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  F. Xavier Roca,et al.  Human action recognition using an ensemble of body-part detectors , 2013, Expert Syst. J. Knowl. Eng..

[7]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[8]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[9]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Yang Wang,et al.  Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Heng Wang LEAR-INRIA submission for the THUMOS workshop , 2013 .

[12]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[13]  Nazli Ikizler-Cinbis,et al.  Action Recognition and Localization by Hierarchical Space-Time Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[15]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[16]  Alan L. Yuille,et al.  An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Roland Göcke,et al.  On the Effect of Human Body Parts in Large Scale Human Behaviour Recognition , 2013, 2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[18]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[19]  David A. McAllester,et al.  Object Detection with Grammar Models , 2011, NIPS.

[20]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.