Action recognition: A region based approach

We address the problem of recognizing actions in reallife videos. Space-time interest point-based approaches have been widely prevalent towards solving this problem. In contrast, more spatially extended features such as regions have not been so popular. The reason is, any local region based approach requires the motion flow information for a specific region to be collated temporally. This is challenging as the local regions are deformable and not well delineated from the surroundings. In this paper we address this issue by using robust tracking of regions and we show that it is possible to obtain region descriptors for classification of actions. This paper lays the groundwork for further investigation into region based approaches. Through this paper we make the following contributions a) We advocate identification of salient regions based on motion segmentation b) We adopt a state-of-the art tracker for robust tracking of the identified regions rather than using isolated space-time blocks c) We propose optical flow based region descriptors to encode the extracted trajectories in piece-wise blocks. We demonstrate the performance of our system on real-world data sets.

[1]  Luc Van Gool,et al.  Exemplar-based Action Recognition in Video , 2009, BMVC.

[2]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Lihi Zelnik-Manor,et al.  Statistical analysis of dynamic actions , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Lior Wolf,et al.  Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[11]  Narendra Ahuja,et al.  Connected Segmentation Tree — A joint representation of region layout and hierarchy , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Gunnar Farnebäck Very high accuracy velocity estimation using orientation tensors , 2001, ICCV 2001.

[13]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[14]  Gary R. Bradski,et al.  Motion segmentation and pose recognition with motion history gradients , 2002, Machine Vision and Applications.

[15]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[16]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[18]  Ian D. Reid,et al.  Robust Real-Time Visual Tracking Using Pixel-Wise Posteriors , 2008, ECCV.

[19]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[20]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[21]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..