Towards Robust Sensor Fusion in Visual Perception

We study the problem of robust sensor fusion in visual perception, especially under the autonomous driving settings. We evaluate the robustness of RGB camera and LiDAR sensor fusion for binary classification and object detection. In this work, we are interested in the behavior of different fusion methods under adversarial attacks on different sensors. We first train both classification and detection models with early fusion and late fusion, then apply different combinations of adversarial attacks on both sensor inputs for evaluation. We also study the effectiveness of adversarial attacks with varying budgets. Experiment results show that while sensor fusion models are generally vulnerable to adversarial attacks, late fusion method is more robust than early fusion. The results also provide insights on further obtaining robust sensor fusion models.

[1]  Yong Man Ro,et al.  Investigating Vulnerability to Adversarial Examples on Multimodal Data Fusion in Deep Learning , 2020, ArXiv.

[2]  Atul Prakash,et al.  Robust Physical-World Attacks on Deep Learning Visual Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Marcus Rohrbach,et al.  Multimodal Video Description , 2016, ACM Multimedia.

[4]  Xin Wang,et al.  Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning , 2018, NAACL.

[5]  Ahmet M. Kondoz,et al.  Robust Fusion of LiDAR and Wide-Angle Camera Data for Autonomous Mobile Robots , 2017, Sensors.

[6]  Klaus C. J. Dietmayer,et al.  Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges , 2019, IEEE Transactions on Intelligent Transportation Systems.

[7]  Toon Goedemé,et al.  Fooling Automated Surveillance Cameras: Adversarial Patches to Attack Person Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Cristina Nita-Rotaru,et al.  Are Self-Driving Cars Secure? Evasion Attacks Against Deep Neural Networks for Steering Angle Prediction , 2019, 2019 IEEE Security and Privacy Workshops (SPW).

[9]  Dmytro Mishkin,et al.  Kornia: an Open Source Differentiable Computer Vision Library for PyTorch , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[10]  Raquel Urtasun,et al.  Physically Realizable Adversarial Examples for LiDAR Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Paul Mineiro,et al.  Robust Sensor Fusion: Analysis and Application to Audio Visual Speech Recognition , 1998, Machine Learning.

[12]  Steven Lake Waslander,et al.  Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[14]  Klaus C. J. Dietmayer,et al.  Optimal Sensor Data Fusion Architecture for Object Detection in Adverse Weather Conditions , 2018, 2018 21st International Conference on Information Fusion (FUSION).

[15]  Duen Horng Chau,et al.  ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector , 2018, ECML/PKDD.

[16]  Liang Tong,et al.  Defending Against Physically Realizable Attacks on Image Classification , 2020, ICLR.

[17]  Juergen Luettin,et al.  Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..

[18]  Ilja Radusch,et al.  Early Fusion of Camera and Lidar for robust road detection based on U-Net FCN , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[19]  David A. Forsyth,et al.  NO Need to Worry about Adversarial Examples in Object Detection in Autonomous Vehicles , 2017, ArXiv.

[20]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[21]  Ruigang Yang,et al.  Adversarial Objects Against LiDAR-Based Autonomous Driving Systems , 2019, ArXiv.

[22]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[23]  Chong Xiang,et al.  Generating 3D Adversarial Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Kevin Fu,et al.  Adversarial Sensor Attack on LiDAR-based Perception in Autonomous Driving , 2019, CCS.

[25]  Hui Zhou,et al.  Robust Multi-Modality Multi-Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Yin Zhou,et al.  MVX-Net: Multimodal VoxelNet for 3D Object Detection , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[27]  Yifan Chen,et al.  Anatomical context protects deep learning from adversarial perturbations in medical imaging , 2020, Neurocomputing.

[28]  Martín Abadi,et al.  Adversarial Patch , 2017, ArXiv.

[29]  Xiapu Luo,et al.  A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models , 2019, CCS.

[30]  Bin Yang,et al.  HDNET: Exploiting HD Maps for 3D Object Detection , 2018, CoRL.

[31]  Atul Prakash,et al.  Note on Attacking Object Detectors with Adversarial Stickers , 2017, ArXiv.

[32]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[34]  Bin Yang,et al.  Deep Continuous Fusion for Multi-sensor 3D Object Detection , 2018, ECCV.

[35]  Paulo Peixoto,et al.  Multimodal vehicle detection: fusing 3D-LIDAR and color camera data , 2017, Pattern Recognit. Lett..

[36]  Michael J. Black,et al.  Attacking Optical Flow , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[38]  Shu Wang,et al.  Multispectral Deep Neural Networks for Pedestrian Detection , 2016, BMVC.

[39]  WATCH , 2004 .

[40]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[41]  Vaibhava Goel,et al.  Deep multimodal learning for Audio-Visual Speech Recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  Paul Newman,et al.  Distant Vehicle Detection Using Radar and Vision , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[43]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).