Hard No-Box Adversarial Attack on Skeleton-Based Human Action Recognition with Skeleton-Motion-Informed Gradient

Recently, methods for skeleton-based human activity recognition have been shown to be vulnerable to adversarial attacks. However, these attack methods require either the full knowledge of the victim (i.e. white-box attacks), access to training data (i.e. transfer-based attacks) or frequent model queries (i.e. black-box attacks). All their requirements are highly restrictive, raising the question of how detrimental the vulnerability is. In this paper, we show that the vulnerability indeed exists. To this end, we consider a new attack task: the attacker has no access to the victim model or the training data or labels, where we coin the term hard no-box attack. Specifically, we first learn a motion manifold where we define an adversarial loss to compute a new gradient for the attack, named skeleton-motion-informed (SMI) gradient. Our gradient contains information of the motion dynamics, which is different from existing gradient-based attack methods that compute the loss gradient assuming each dimension in the data is independent. The SMI gradient can augment many gradient-based attack methods, leading to a new family of no-box attack methods. Extensive evaluation and comparison show that our method imposes a real threat to existing classifiers. They also show that the SMI gradient improves the transferability and imperceptibility of adversarial samples in both no-box and transfer-based black-box settings.

[1]  Hubert P. H. Shum,et al.  On the Design Fundamentals of Diffusion Models: A Survey , 2023, ArXiv.

[2]  Haodong Duan,et al.  Self-supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences , 2023, AAAI.

[3]  Hubert P. H. Shum,et al.  Unifying Human Motion Synthesis and Style Transfer with Denoising Diffusion Probabilistic Models , 2022, VISIGRAPP.

[4]  Jiahang Zhang,et al.  Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations , 2022, AAAI.

[5]  Xiangjun Tang,et al.  Real-time controllable motion transition for characters , 2022, ACM Trans. Graph..

[6]  G. Guo,et al.  Defending Black-box Skeleton-based Human Activity Classifiers , 2022, AAAI.

[7]  Chuang Gan,et al.  When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning? , 2021, NeurIPS.

[8]  K. Kawamoto,et al.  Adversarial Bone Length Attack on Action Recognition , 2021, AAAI.

[9]  Hubert P. H. Shum,et al.  A Quadruple Diffusion Convolutional Recurrent Network for Human Motion Prediction , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Hazel Doughty,et al.  Skeleton-Contrastive 3D Action Representation Learning , 2021, ACM Multimedia.

[11]  Ajmal Mian,et al.  Advances in adversarial attacks and defenses in computer vision: A survey , 2021, IEEE Access.

[12]  Kun Zhou,et al.  Understanding the Robustness of Skeleton-based Action Recognition under Adversarial Attack , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Kun Zhou,et al.  BASAR:Black-box Attack on Skeletal Action Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Hao Chen,et al.  Practical No-box Adversarial Attacks against DNNs , 2020, NeurIPS.

[15]  Davis W. Blalock,et al.  Better Aggregation in Test-Time Augmentation , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Wenhan Yang,et al.  MS2L: Multi-Task Self-Supervised Learning for Skeleton Based Action Recognition , 2020, ACM Multimedia.

[17]  Zhiyong Wang,et al.  Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Hong Liu,et al.  A Survey on 3D Skeleton-Based Action Recognition Using Learning Method , 2020, Cyborg and bionic systems.

[19]  Abhinav Gupta,et al.  ClusterFit: Improving Generalization of Visual Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  H. Pirsiavash,et al.  Hidden Trigger Backdoor Attacks , 2019, AAAI.

[21]  Jian Liu,et al.  Adversarial Attack on Skeleton-Based Human Action Recognition , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Zhanxing Zhu,et al.  Spatio-Temporal Manifold Learning for Human Motions via Long-Horizon Modeling , 2019, IEEE Transactions on Visualization and Computer Graphics.

[23]  Gang Wang,et al.  NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Xu Chen,et al.  Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Nanning Zheng,et al.  Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[27]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[28]  Lei Shi,et al.  Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Alan L. Yuille,et al.  Improving Transferability of Adversarial Examples With Input Diversity , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[31]  D. Levey Recognition , 2017, The Harps that Once....

[32]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[33]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Francis Tuerlinckx,et al.  Changing Dynamics: Time-Varying Autoregressive Models Using Generalized Additive Modeling , 2017, Psychological methods.

[35]  Austin Reiter,et al.  Interpretable 3D Human Action Analysis with Temporal Convolutional Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  Mohammed Bennamoun,et al.  A New Representation of Skeleton Sequences for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[38]  Wenjun Zeng,et al.  An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[39]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[40]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Jessica K. Hodgins,et al.  Realtime style transfer for unlabeled heterogeneous human motion , 2015, ACM Trans. Graph..

[42]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[44]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[45]  Taku Komura,et al.  An Energy-Driven Motion Planning Method for Two Distant Postures , 2015, IEEE Transactions on Visualization and Computer Graphics.