Human action recognition in RGB-D videos using motion sequence information and deep learning

An approach to recognize human actions in RGB-D videos using motion sequence information and deep learning is proposed.Proposed a new representation of motion information for human action recognition that emphasizes motion in various temporal regions.The use of motion information in RGB and depth video streams.Analysis using t-SNE visualization of ConvNet features to show the discriminative characteristics of the proposed representation. Display Omitted In this paper, we propose an approach for recognizing human actions based on motion sequence information in RGB-D video using deep learning. A new representation that gives emphasis to the key poses associated with each action is presented. The features obtained from motion in RGB and depth video streams are given as input to the convolutional neural network to learn the discriminative features. The efficacy of the proposed approach is demonstrated on MIVIA action, NATOPS gesture, SBU Kinect interaction, and Weizmann datasets.

[1]  Meng Wang,et al.  A Deep Structured Model with Radius–Margin Bound for 3D Human Activity Recognition , 2015, International Journal of Computer Vision.

[2]  Jianxin Wu,et al.  A Heat-Map-Based Algorithm for Recognizing Group Activities in Videos , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Yang Feng,et al.  Heterogeneous discriminant analysis for cross-view action recognition , 2016, Neurocomputing.

[4]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[5]  Baining Guo,et al.  Exemplar-based human action pose correction and tagging , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Xianghua Xie,et al.  Generating Local Temporal Poses from Gestures with Aligned Cluster Analysis for Human Action Recognition , 2015 .

[8]  Bingbing Ni,et al.  RGBD-HuDaAct: A color-depth video database for human daily activity recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[9]  Yale Song,et al.  Tracking body and hands for gesture recognition: NATOPS aircraft handling signals database , 2011, Face and Gesture 2011.

[10]  Michal Irani,et al.  Similarity by Composition , 2006, NIPS.

[11]  Nasser Kehtarnavaz,et al.  UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[12]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Baining Guo,et al.  Exemplar-Based Human Action Pose Correction , 2014, IEEE Transactions on Cybernetics.

[14]  Michael Firman,et al.  RGBD Datasets: Past, Present and Future , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[16]  Alessia Saggese,et al.  Recognition of Human Actions using Edit Distance on Aclet Strings , 2015, VISAPP.

[17]  Alessia Saggese,et al.  Recognition of Human Actions from RGB-D Videos Using a Reject Option , 2013, ICIAP Workshops.

[18]  Yale Song,et al.  Continuous body and hand gesture recognition for natural human-computer interaction , 2012, TIIS.

[19]  Yale Song,et al.  Multi-view latent variable discriminative models for action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Yi Liu,et al.  DA-CCD: A novel action representation by Deep Architecture of local depth feature , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[21]  Yang Wang,et al.  Human Action Recognition by Semilatent Topic Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[24]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[28]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[29]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Mubarak Shah,et al.  Chaotic Invariants for Human Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[31]  Jing Zhang,et al.  RGB-D-based action recognition datasets: A survey , 2016, Pattern Recognit..

[32]  Alessia Saggese,et al.  HAck: A system for the recognition of human actions by kernels of visual strings , 2014, 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[33]  Cordelia Schmid,et al.  Explicit Modeling of Human-Object Interactions in Realistic Videos , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Yu-Ting Su,et al.  Single/multi-view human action recognition via regularized multi-task learning , 2015, Neurocomputing.

[35]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Matti Pietikäinen,et al.  Human Activity Recognition Using a Dynamic Texture Based Method , 2008, BMVC.

[37]  Alessia Saggese,et al.  Recognizing Human Actions by a Bag of Visual Words , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[38]  Xiaohui Xie,et al.  Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[39]  Alessia Saggese,et al.  Exploiting the deep learning paradigm for recognizing human actions , 2014, 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[40]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Wenjun Zeng,et al.  An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[42]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[43]  Hong Cheng,et al.  Interactive body part contrast mining for human interaction recognition , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[44]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[45]  Deepu Rajan,et al.  Human activities recognition using depth images , 2013, MM '13.

[46]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[47]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[48]  Thuong Le-Tien,et al.  PAM-based flexible generative topic model for 3D interactive activity recognition , 2015, 2015 International Conference on Advanced Technologies for Communications (ATC).

[49]  R. Venkatesh Babu,et al.  Human action recognition using depth maps , 2012, 2012 International Conference on Signal Processing and Communications (SPCOM).

[50]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[51]  Georgios Evangelidis,et al.  Skeletal Quads: Human Action Recognition Using Joint Quadruples , 2014, 2014 22nd International Conference on Pattern Recognition.

[52]  Qi Tian,et al.  Human Daily Action Analysis with Multi-view and Color-Depth Data , 2012, ECCV Workshops.