Investigating Vulnerability to Adversarial Examples on Multimodal Data Fusion in Deep Learning

The success of multimodal data fusion in deep learning appears to be attributed to the use of complementary in-formation between multiple input data. Compared to their predictive performance, relatively less attention has been devoted to the robustness of multimodal fusion models. In this paper, we investigated whether the current multimodal fusion model utilizes the complementary intelligence to defend against adversarial attacks. We applied gradient based white-box attacks such as FGSM and PGD on MFNet, which is a major multispectral (RGB, Thermal) fusion deep learning model for semantic segmentation. We verified that the multimodal fusion model optimized for better prediction is still vulnerable to adversarial attack, even if only one of the sensors is attacked. Thus, it is hard to say that existing multimodal data fusion models are fully utilizing complementary relationships between multiple modalities in terms of adversarial robustness. We believe that our observations open a new horizon for adversarial attack research on multimodal data fusion.

[1]  Yasuyuki Matsushita,et al.  Deep Learning for Multimodal Data Fusion , 2019, Multimodal Scene Understanding.

[2]  Bernhard Schölkopf,et al.  Adversarial Vulnerability of Neural Networks Increases With Input Dimension , 2018, ArXiv.

[3]  Fabio Roli,et al.  Multimodal Person Reidentification Using RGB-D Cameras , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Graham W. Taylor,et al.  Deep Multimodal Learning: A Survey on Recent Advances and Trends , 2017, IEEE Signal Processing Magazine.

[5]  Ke Lu,et al.  An overview of multi-modal medical image fusion , 2016, Neurocomputing.

[6]  Philip H. S. Torr,et al.  On the Robustness of Semantic Segmentation Models to Adversarial Attacks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Nic Ford,et al.  Adversarial Examples Are a Natural Consequence of Test Error in Noise , 2019, ICML.

[8]  Tao Chen,et al.  Semantic segmentation of RGBD images based on deep depth regression , 2018, Pattern Recognit. Lett..

[9]  Yuxiang Sun,et al.  RTFNet: RGB-Thermal Fusion Network for Semantic Segmentation of Urban Scenes , 2019, IEEE Robotics and Automation Letters.

[10]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Daniel Cremers,et al.  FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture , 2016, ACCV.

[12]  Zhikui Chen,et al.  A Survey on Deep Learning for Multimodal Data Fusion , 2020, Neural Computation.

[13]  Xiang Li,et al.  Deep Learning-Based Image Segmentation on Multimodal Medical Imaging , 2019, IEEE Transactions on Radiation and Plasma Medical Sciences.

[14]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[15]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[16]  Dapeng Tao,et al.  Deep Multi-View Feature Learning for Person Re-Identification , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Tatsuya Harada,et al.  MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Tae Joon Jun,et al.  DAPAS : Denoising Autoencoder to Prevent Adversarial attack in Semantic Segmentation , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).