Pose-invariant kinematic features for action recognition

Recognition of actions from videos is a difficult task due to several factors like dynamic backgrounds, occlusion, pose-variations observed. To tackle the pose variation problem, we propose a simple method based on a novel set of pose-invariant kinematic features which are encoded in a human body centric space. The proposed framework begins with detection of neck point, which will serve as a origin of body centric space. We propose a deep learning based classifier to detect neck point based on the output of fully connected network layer. With the help of the detected neck, propagation mechanism is proposed to divide the foreground region into head, torso and leg grids. The motion observed in each of these body part grids are represented using a set of pose-invariant kinematic features. These features represent motion of foreground or body region with respect to the detected neck point's motion and encoded based on view in a human body centric space. Based on these features, poseinvariant action recognition can be achieved. Due to the body centric space is used, non-upright human posture actions can also be handled easily. To test its effectiveness in non-upright human postures in actions, a new dataset is introduced with 8 non-upright actions performed by 35 subjects in 3 different views. Experiments have been conducted on benchmark and newly proposed non-upright action dataset to identify limitations and get insights on the proposed framework.

[1]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Takatsugu Hirayama,et al.  Simultaneous Action Recognition and Localization Based on Multi-view Hough Voting , 2013, 2013 2nd IAPR Asian Conference on Pattern Recognition.

[3]  Siu-Yeung Cho,et al.  Recognising human actions by analysing negative spaces , 2012 .

[4]  Seong-Whan Lee,et al.  View-independent human action recognition with Volume Motion Template on single stereo camera , 2010, Pattern Recognit. Lett..

[5]  Jean-Michel Jolion,et al.  Pairwise Features for Human Action Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[6]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[7]  Jian Dong,et al.  Deep Human Parsing with Active Template Regression , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Gang Sun,et al.  A Key Volume Mining Deep Framework for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  K. Subramanian,et al.  Human Action Recognition using MetaCognitive Neuro-Fuzzy Inference System , 2012 .

[10]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[11]  Larry S. Davis,et al.  Recognizing Human Actions by Learning and Matching Shape-Motion Prototype Trees , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Binlong Li,et al.  Activity recognition using dynamic subspace angles , 2011, CVPR 2011.

[13]  Brian C. Lovell,et al.  Kernel analysis on Grassmann manifolds for action recognition , 2013, Pattern Recognit. Lett..

[14]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[15]  Xin Guo,et al.  A spatio-temporal interest point detector based on vorticity for action recognition , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[16]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Christian Wolf,et al.  Sequential Deep Learning for Human Action Recognition , 2011, HBU.

[18]  Iasonas Kokkinos,et al.  Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound , 2011, NIPS.

[19]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Richard P. Wildes,et al.  Action Spotting and Recognition Based on a Spatiotemporal Orientation Analysis , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Xiaolong Wang,et al.  Discriminative Deep Belief Networks for image classification , 2010, 2010 IEEE International Conference on Image Processing.

[22]  Mubarak Shah,et al.  Discovering Motion Primitives for Unsupervised Grouping and One-Shot Learning of Human Actions, Gestures, and Expressions , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Limin Wang,et al.  Motionlets: Mid-level 3D Parts for Human Motion Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[25]  Ling Shao,et al.  Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition , 2013, Pattern Recognit..

[26]  Patrick Bouthemy,et al.  Better Exploiting Motion for Better Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[28]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[31]  Changyin Sun,et al.  Action recognition using linear dynamic systems , 2013, Pattern Recognit..

[32]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[33]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[34]  Jun Li,et al.  Robust Representation and Recognition of Facial Emotions Using Extreme Sparse Learning , 2015, IEEE Transactions on Image Processing.

[35]  François Brémond,et al.  Contextual Statistics of Space-Time Ordered Features for Human Action Recognition , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[36]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Tanaya Guha,et al.  Learning Sparse Representations for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Songtao Ding,et al.  A Human Action Recognition Method Based on Tchebichef Moment Invariants and Temporal Templates , 2012, 2012 4th International Conference on Intelligent Human-Machine Systems and Cybernetics.

[39]  Chee Kheong Siew,et al.  Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[40]  Rita Cucchiara,et al.  Detecting Moving Objects, Ghosts, and Shadows in Video Streams , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Mei-Ling Shyu,et al.  Spatial-temporal motion information integration for action detection and recognition in non-static background , 2013, 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI).

[43]  Dong Liang,et al.  A 3D object recognition and pose estimation system using deep learning method , 2014, 2014 4th IEEE International Conference on Information Science and Technology.

[44]  Wei-Yun Yau,et al.  Human Action Recognition With Video Data: Research and Evaluation Challenges , 2014, IEEE Transactions on Human-Machine Systems.

[45]  Wei-Yun Yau,et al.  Improving human body part detection using deep learning and motion consistency , 2016, 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV).

[46]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.