A Large-scale Varying-view RGB-D Action Dataset for Arbitrary-view Human Action Recognition

Current researches of action recognition mainly focus on single-view and multi-view recognition, which can hardly satisfies the requirements of human-robot interaction (HRI) applications to recognize actions from arbitrary views. The lack of datasets also sets up barriers. To provide data for arbitrary-view action recognition, we newly collect a large-scale RGB-D action dataset for arbitrary-view action analysis, including RGB videos, depth and skeleton sequences. The dataset includes action samples captured in 8 fixed viewpoints and varying-view sequences which covers the entire 360 degree view angles. In total, 118 persons are invited to act 40 action categories, and 25,600 video samples are collected. Our dataset involves more participants, more viewpoints and a large number of samples. More importantly, it is the first dataset containing the entire 360 degree varying-view sequences. The dataset provides sufficient data for multi-view, cross-view and arbitrary-view action analysis. Besides, we propose a View-guided Skeleton CNN (VS-CNN) to tackle the problem of arbitrary-view action recognition. Experiment results show that the VS-CNN achieves superior performance.

[1]  Atsushi Shimada,et al.  Cooking gesture recognition using local feature and depth image , 2012, CEA '12.

[2]  Gregory D. Hager,et al.  Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jian-Huang Lai,et al.  Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jian-Huang Lai,et al.  Exemplar-Based Recognition of Human–Object Interactions , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Ying Wu,et al.  Cross-View Action Modeling, Learning, and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Chunheng Wang,et al.  Cross-View Action Recognition via a Continuous Virtual Path , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Hong Cheng,et al.  Learning contrastive feature distribution model for interaction recognition , 2015, J. Vis. Commun. Image Represent..

[9]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[10]  Arif Mahmood,et al.  Histogram of Oriented Principal Components for Cross-View Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Austin Reiter,et al.  Interpretable 3D Human Action Analysis with Temporal Convolutional Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Gang Wang,et al.  Real-Time RGB-D Activity Prediction by Soft Regression , 2016, ECCV.

[13]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Mohan S. Kankanhalli,et al.  Benchmarking a Multimodal and Multiview and Interactive Dataset for Human Action Recognition , 2017, IEEE Transactions on Cybernetics.

[15]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Zi Huang,et al.  Web Video Event Recognition by Semantic Analysis from Ubiquitous Documents. , 2016, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[17]  Junsong Yuan,et al.  Learning Actionlet Ensemble for 3D Human Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Qi Tian,et al.  Human Daily Action Analysis with Multi-view and Color-Depth Data , 2012, ECCV Workshops.

[19]  Yang Yang,et al.  One-shot learning based pattern transition map for action early recognition , 2018, Signal Process..

[20]  Qi Tian,et al.  Ieee Transactions on Image Processing Spatial Pooling of Heterogeneous Features for Image Classification , 2022 .

[21]  Hong Liu,et al.  Enhanced skeleton visualization for view invariant human action recognition , 2017, Pattern Recognit..

[22]  Xuelong Li,et al.  Robust Web Image Annotation via Exploring Multi-Facet and Structural Knowledge , 2017, IEEE Transactions on Image Processing.

[23]  Heng Tao Shen,et al.  Recognition and Detection of Two-Person Interactive Actions Using Automatically Selected Skeleton Features , 2018, IEEE Transactions on Human-Machine Systems.

[24]  James J. Little,et al.  3D Pose from Motion for Cross-View Action Recognition via Non-linear Circulant Temporal Encoding , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Ling Shao,et al.  Multi-view action recognition using local similarity random forests and sensor fusion , 2013, Pattern Recognit. Lett..

[26]  Yutaka Satoh,et al.  Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Nanning Zheng,et al.  Modeling 4D Human-Object Interactions for Event and Object Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[28]  Xuelong Li,et al.  Detecting Densely Distributed Graph Patterns for Fine-Grained Image Categorization , 2016, IEEE Transactions on Image Processing.

[29]  H. Zhang,et al.  Collaborative sparse representation leaning model for RGBD action recognition , 2017, J. Vis. Commun. Image Represent..

[30]  Ajmal S. Mian,et al.  Learning a non-linear knowledge transfer model for cross-view action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Mubarak Shah,et al.  Learning 4D action feature models for arbitrary view action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Xuelong Li,et al.  Describing Video With Attention-Based Bidirectional LSTM , 2019, IEEE Transactions on Cybernetics.

[34]  Heng Tao Shen,et al.  Hierarchical Latent Concept Discovery for Video Event Detection , 2017, IEEE Transactions on Image Processing.

[35]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[36]  Limin Wang,et al.  Multi-view Super Vector for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Wei-Shi Zheng,et al.  Semi-Supervised Multi-View Discrete Hashing for Fast Image Search , 2017, IEEE Transactions on Image Processing.

[38]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[39]  Chong-Wah Ngo,et al.  Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling , 2015, IEEE Transactions on Image Processing.

[40]  Shuang Wang,et al.  Skeleton-based action recognition using LSTM and CNN , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[41]  Uwe D. Hanebeck,et al.  The KIT Robo-kitchen data set for the evaluation of view-based activity recognition systems , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[42]  Ajmal Mian,et al.  3D Action Recognition from Novel Viewpoints , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Ruonan Li,et al.  Discriminative virtual views for cross-view action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Shih-Fu Chang,et al.  Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Rongrong Ji,et al.  Generative Adversarial Learning Towards Fast Weakly Supervised Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.