Robustness Against Unknown Noise for Raw Data Fusing Neural Networks

Adverse weather conditions are extremely challenging for autonomous driving as most state-of-the-art sensors do not function reliably under such circumstances. One method to increase the detection performance is to fuse the raw data signal with neural networks that learn robust features from multiple inputs. Nevertheless noise due to adverse weather is complex, and in addition, automotive sensors fail asymmetrically. Neural networks that perform feature level sensor fusion can be particularly vulnerable if one sensor is corrupted by noise outside the training data distribution compared to decision level fusion. The reason for this is that no built-in mathematical mechanism prevents noise in one sensor channel from corrupting the overall network even though other sensor channels may provide high-quality data. We propose a simple data augmentation scheme that shows a neural network may be able to ignore data from underperforming sensors even though it has never seen that data during training. One can summarize this as a learned “OR” operation at fusion stage. This learned operation is also generally applicable to other noise-types not present during training.

[1]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[2]  Stephen P. Boyd,et al.  Dirty Pixels: Optimizing Image Classification Architectures for Raw Sensor Data , 2017, ArXiv.

[3]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Werner Ritter,et al.  Benchmarking Image Sensors Under Adverse Weather Conditions for Autonomous Driving , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[5]  Karl Granström,et al.  Extended Object Tracking: Introduction, Overview and Applications , 2016, ArXiv.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Erik Cambria,et al.  A review of affective computing: From unimodal analysis to multimodal fusion , 2017, Inf. Fusion.

[8]  Sidney K. D'Mello,et al.  A Review and Meta-Analysis of Multimodal Affect Detection Systems , 2015, ACM Comput. Surv..

[9]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Fawzi Nashashibi,et al.  Incremental Cross-Modality deep learning for pedestrian recognition , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[11]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[12]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13]  Zsolt Kira,et al.  Fusing LIDAR and images for pedestrian detection using convolutional neural networks , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Wolfram Burgard,et al.  Choosing smartly: Adaptive multimodal fusion for object detection in changing environments , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Sven Behnke,et al.  Data-efficient Deep Learning for RGB-D Object Perception in Cluttered Bin Picking , 2017 .

[16]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[17]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[18]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Sven Behnke,et al.  Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks , 2016, ESANN.

[21]  Jizheng Xu,et al.  An All-in-One Network for Dehazing and Beyond , 2017, ArXiv.

[22]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[23]  Werner Ritter,et al.  A Benchmark for Lidar Sensors in Fog: Is Detection Breaking Down? , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[24]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25]  Dariu Gavrila,et al.  Multi-cue pedestrian classification with partial occlusion handling , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Jiri Matas,et al.  DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[28]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[29]  Steven Lake Waslander,et al.  Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[31]  Sven Behnke,et al.  RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).