Relative dense tracklets for human action recognition

This paper addresses the problem of recognizing human actions in video sequences for home care applications. Recent studies have shown that approaches which use a bag-of-words representation reach high action recognition accuracy. Unfortunately, these approaches have problems to discriminate similar actions, ignoring spatial information of features. As we focus on recognizing subtle differences in behaviour of patients, we propose a novel method which significantly enhances the discriminative properties of the bag-of-words technique. Our approach is based on a dynamic coordinate system, which introduces spatial information to the bag-of-words model, by computing relative tracklets. We perform an extensive evaluation of our approach on three datasets: popular KTH dataset, challenging ADL dataset and our collected Hospital dataset. Experiments show that our representation enhances the discriminative power of features and bag-of-words model, bringing significant improvements in action recognition performance.

[1]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  S. Kollias,et al.  Dense saliency-based spatiotemporal feature points for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Jiri Matas,et al.  P-N learning: Bootstrapping binary classifiers by structural constraints , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Stefano Soatto,et al.  Tracklet Descriptors for Action Modeling and Video Analysis , 2010, ECCV.

[6]  Mubarak Shah,et al.  Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories , 2011, 2011 International Conference on Computer Vision.

[7]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Ying Wu,et al.  Discriminative subvolume search for efficient action detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[10]  Andrew Gilbert,et al.  Action Recognition Using Mined Hierarchical Compound Features , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  François Brémond,et al.  Recognizing Gestures by Learning Local Motion Signatures of HOG Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Md. Atiqur Rahman Ahad,et al.  Motion history image: its variants and applications , 2012, Machine Vision and Applications.

[14]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Martial Hebert,et al.  Representing Pairwise Spatial and Temporal Relations for Action Recognition , 2010, ECCV.

[17]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Iasonas Kokkinos,et al.  Discovering discriminative action parts from mid-level video representations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Larry S. Davis,et al.  Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Larry S. Davis,et al.  Recognizing Human Actions by Learning and Matching Shape-Motion Prototype Trees , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[23]  François Brémond,et al.  Evaluation of Local Descriptors for Action Recognition in Videos , 2011, ICVS.

[24]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  James J. Little,et al.  Tracking and recognizing actions of multiple hockey players using the boosted particle filter , 2009, Image Vis. Comput..

[26]  Tae-Seong Kim,et al.  Silhouette-based Human Activity Recognition Using Independent Component Analysis, Linear Discriminant Analysis and Hidden Markov Model , 2010 .

[27]  Martial Hebert,et al.  Modeling the Temporal Extent of Actions , 2010, ECCV.

[28]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[29]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[30]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  Tsuhan Chen,et al.  Spatio-Temporal Phrases for Activity Recognition , 2012, ECCV.

[32]  Tae-Kyun Kim,et al.  Tensor Canonical Correlation Analysis for Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[34]  Chabane Djeraba,et al.  Action Recognition Using Direction Models of Motion , 2010, 2010 20th International Conference on Pattern Recognition.

[35]  David Elliott,et al.  In the Wild , 2010 .

[36]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[37]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[38]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[39]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.