Learning action patterns in difference images for efficient action recognition

A new framework is presented for single-person oriented action recognition. This framework does not require detection/location of bounding boxes of human body nor motion estimation in each frame. The novel descriptor/pattern for action representation is learned with local temporal self-similarities (LTSSs) derived directly from difference images. The bag-of-words framework is then employed for action classification taking advantages of these descriptors. We investigated the effectiveness of the framework on two public human action datasets: the Weizmann dataset and the KTH dataset. In the Weizmann dataset, the proposed framework achieves a performance of 95.6% in the recognition rate and that of 91.1% in the KTH dataset, both of which are competitive with those of state-of-the-art approaches, but it has a high potential to achieve a faster execution performance.

[1]  Peyman Milanfar,et al.  Detection of human actions from a single example , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  Manuel J. Marín-Jiménez,et al.  Learning Features for Human Action Recognition Using Multilayer Architectures , 2011, IbPRIA.

[4]  Gerhard Rigoll,et al.  Action Recognition in Meeting Scenarios using Global Motion Features , 2003 .

[5]  Ling Shao,et al.  Silhouette Analysis-Based Action Recognition Via Exploiting Human Poses , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[7]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008, International Journal of Computer Vision.

[8]  Hassan Foroosh,et al.  View-invariant action recognition using fundamental ratios , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Ling Shao,et al.  Human action segmentation and recognition via motion and shape analysis , 2012, Pattern Recognit. Lett..

[10]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[11]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[12]  S. SubrahmanianV.,et al.  Machine Recognition of Human Activities , 2008 .

[13]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Q. M. Jonathan Wu,et al.  Human action recognition using extreme learning machine based on visual vocabularies , 2010, Neurocomputing.

[15]  Ramakant Nevatia,et al.  Learning 3D action models from a few 2D videos for view invariant action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Larry S. Davis,et al.  Recognizing Human Actions by Learning and Matching Shape-Motion Prototype Trees , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Ling Shao,et al.  Action recognition using Correlogram of Body Poses and spectral regression , 2011, 2011 18th IEEE International Conference on Image Processing.

[21]  Mohiuddin Ahmad,et al.  Human action recognition using shape and CLG-motion flow from multi-view image sequences , 2008, Pattern Recognit..

[22]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[23]  Takumi Kobayashi,et al.  Motion recognition using local auto-correlation of space-time gradients , 2012, Pattern Recognit. Lett..

[24]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, ICPR 2004.

[27]  Ling Shao,et al.  Histogram of Body Poses and Spectral Regression Discriminant Analysis for Human Action Categorization , 2010, BMVC.

[28]  Janusz Konrad,et al.  Action Recognition Using Sparse Representation on Covariance Manifolds of Optical Flow , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[29]  Yih-Fang Huang,et al.  A constrained vector quantization scheme for real-time codebook retransmission , 1994, IEEE Trans. Circuits Syst. Video Technol..

[30]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[31]  Yihong Gong,et al.  Human action detection by boosting efficient motion features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[32]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Jitendra Malik,et al.  Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[34]  Mineichi Kudo,et al.  Self-Similarities in Difference Images: A New Cue for Single-Person Oriented Action Recognition , 2013, IEICE Trans. Inf. Syst..

[35]  Rachid Benmokhtar Robust human action recognition scheme based on high-level feature fusion , 2012, Multimedia Tools and Applications.

[36]  Larry S. Davis,et al.  Motion-based recognition of people in EigenGait space , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[37]  Thomas B. Moeslund,et al.  View invariant gesture recognition using the CSEM SwissRanger SR-2 camera , 2008, Int. J. Intell. Syst. Technol. Appl..

[38]  Hong Bao,et al.  Video-Based Human Motion Analysis , 2011 .

[39]  Chiraz Ben Abdelkader Motion-Based Recognition of People in EigenGait Space , 2002 .

[40]  Sebastian Nowozin,et al.  Combining appearance and motion for human action classification in videos , 2009, CVPR 2009.

[41]  Changyin Sun,et al.  Supervised class-specific dictionary learning for sparse modeling in action recognition , 2012, Pattern Recognit..

[42]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Patrick Pérez,et al.  Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[44]  Alberto Del Bimbo,et al.  Recognizing human actions by fusing spatio-temporal appearance and motion descriptors , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).