Action Recognition with Domain Invariant Features of Skeleton Image

Due to the fast processing-speed and robustness it can achieve, skeleton-based action recognition has recently received the attention of the computer vision community. The recent Convolutional Neural Network (CNN)-based methods have shown commendable performance in learning spatio-temporal representations for skeleton sequence, which use skeleton image as input to a CNN. Since the CNN-based methods mainly encoding the temporal and skeleton joints simply as rows and columns, respectively, the latent correlation related to all joints may be lost caused by the 2D convolution. To solve this problem, we propose a novel CNN-based method with adversarial training for action recognition. We introduce a two-level domain adversarial learning to align the features of skeleton images from different view angles or subjects, respectively, thus further improve the generalization. We evaluated our proposed method on NTU RGB+D. It achieves competitive results compared with state-of-the-art methods and 2.4%, 1.9%accuracy gain than the baseline for cross-subject and cross-view.

[1]  Qilong Wang,et al.  Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Mingsheng Long,et al.  Regressive Domain Adaptation for Unsupervised Keypoint Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Ruxin Chen,et al.  Temporal Attentive Alignment for Large-Scale Video Domain Adaptation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Hichem Snoussi,et al.  Exploring a rich spatial-temporal dependent relational model for skeleton-based action recognition by bidirectional LSTM-CNN , 2020, Neurocomputing.

[5]  MarchandMario,et al.  Domain-adversarial training of neural networks , 2016 .

[6]  Austin Reiter,et al.  Interpretable 3D Human Action Analysis with Temporal Convolutional Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7]  Gang Wang,et al.  Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks , 2017, IEEE Transactions on Image Processing.

[8]  Jiayi Luo,et al.  Skeleton-based action recognition by part-aware graph convolutional networks , 2019, The Visual Computer.

[9]  Mohan S. Kankanhalli,et al.  Unsupervised Learning of View-invariant Action Representations , 2018, NeurIPS.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Pietro Zanuttigh,et al.  Unsupervised Domain Adaptation with Multiple Domain Discriminators and Adaptive Self-Training , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[12]  Trevor Darrell,et al.  Semi-Supervised Domain Adaptation via Minimax Entropy , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Jongmin Yu,et al.  Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition , 2020, Applied Intelligence.

[14]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Daniel Cremers,et al.  Associative Domain Adaptation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  François Rameau,et al.  Unsupervised Intra-Domain Adaptation for Semantic Segmentation Through Self-Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Wenjun Zeng,et al.  An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[18]  Wing W. Y. Ng,et al.  Multi-Localized Sensitive Autoencoder-Attention-LSTM For Skeleton-based Action Recognition , 2022, IEEE Transactions on Multimedia.

[19]  Yang Zou,et al.  Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training , 2018, ArXiv.

[20]  William Robson Schwartz,et al.  Skeleton Image Representation for 3D Action Recognition Based on Tree Structure and Reference Joints , 2019, 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI).

[21]  Patrick Pérez,et al.  ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  K. S. Venkatesh,et al.  Deep Domain Adaptation in Action Space , 2018, BMVC.

[23]  Jefersson Alex dos Santos,et al.  SkeleMotion: A New Representation of Skeleton Joint Sequences based on Motion Information for 3D Action Recognition , 2019, 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[24]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[25]  Mengjie Zhang,et al.  Domain Adaptive Neural Networks for Object Recognition , 2014, PRICAI.

[26]  Ibrahim Batuhan Akkaya,et al.  Self-training Guided Adversarial Domain Adaptation For Thermal Imagery , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Hong Liu,et al.  Enhanced skeleton visualization for view invariant human action recognition , 2017, Pattern Recognit..

[28]  Sheng-Luen Chung,et al.  Global Spatio-Temporal Attention for Action Recognition Based on 3D Human Skeleton Data , 2020, IEEE Access.

[29]  Sanghoon Lee,et al.  Ensemble Deep Learning for Skeleton-Based Action Recognition Using Temporal Sliding LSTM Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Chao Li,et al.  Skeleton-based action recognition with convolutional neural networks , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[31]  Cordelia Schmid,et al.  Actor and Observer: Joint Modeling of First and Third-Person Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Koichi Shinoda,et al.  A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition , 2018, BMVC.

[33]  Philip David,et al.  Domain Adaptation for Semantic Segmentation of Urban Scenes , 2017 .

[34]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[35]  Jian Yang,et al.  Action-Attending Graphic Neural Network , 2017, IEEE Transactions on Image Processing.

[36]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[37]  Jiaying Liu,et al.  Optimized Skeleton-based Action Recognition via Sparsified Graph Regression , 2018, ACM Multimedia.

[38]  Tieniu Tan,et al.  An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Kate Saenko,et al.  Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[40]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[41]  Qi-Xing Huang,et al.  Domain Transfer Through Deep Activation Matching , 2018, ECCV.

[42]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Mohammed Bennamoun,et al.  A New Representation of Skeleton Sequences for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Cornelia Caragea,et al.  Disaster Response Aided by Tweet Classification with a Domain Adaptation Approach , 2018 .

[45]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..