Graph Convolutional Networks with Objects for Skeleton-Based Action Recognition

Recently, graph convolutional neural networks has become a research hotspot for skeleton-based action recognition because of its excellent performance on graph structure data. Compared to traditional methods, it can explicitly exploit the natural connectivity among the joints and improve greater expressive power. In this paper, we propose a two-stream graph convolutional networks with objects for skeleton-based action recognition. An algorithm is designed for matching similar skeleton in adjacent frames, so that we can get the right skeletons which belong to the same person. It performs well when there are other irrelevant persons in the scene. In addition, other features are less employed except for the human joint in skeleton-based action recognition. We introduce limbs orientation information and related objects information. The related objects are treated as joint points which link with hands. The two-stream networks are built to model coordinate features and orientation features respectively, the results of two streams are fused to one. We get good results on the Kinetics dataset with our methods.

[1]  Dacheng Tao,et al.  Graph Edge Convolutional Neural Networks for Skeleton-Based Action Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[3]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Mohammed Bennamoun,et al.  A New Representation of Skeleton Sequences for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Xiaoming Liu,et al.  On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Nikos Komodakis,et al.  Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Tao Zhang,et al.  Real-time Vision-based Gesture Recognition for Human Robot Interaction , 2004, 2004 IEEE International Conference on Robotics and Biomimetics.

[8]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[9]  Austin Reiter,et al.  Interpretable 3D Human Action Analysis with Temporal Convolutional Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[11]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[13]  Xiaohui Xie,et al.  Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[14]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Guokun Lai,et al.  Learning Graph Convolution Filters from Data Manifold , 2017, ArXiv.

[16]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[17]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[18]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[19]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[20]  Nathan D. Cahill,et al.  Robust Spatial Filtering With Graph Convolutional Neural Networks , 2017, IEEE Journal of Selected Topics in Signal Processing.

[21]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[23]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[24]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.