Generating robust real-time object detector with uncertainty via virtual adversarial training

Despite remarkable accuracy improvement in convolutional neural networks (CNNs) based object detectors, there are still some problems in applying on some safety–critical domain, such as the self-driving domain, in part due to the complexity of verifying the correctness of detecting results and the lack of safety guarantees. By simply modeling the bounding box parameters with a Gaussian distribution in a real-time object detector, we propose a new method for predicting uncertainty, which can quantify the reliability of the neural networks’ prediction, to validate the correctness of detecting results with low computational complexity. In addition, we redesign the loss function by adding a new regularization term, called virtual adversarial training (VAT). The use of VAT, which is defined as the robustness of the conditional label distribution around input data against local perturbation, can smooth the output distribution robust with lower uncertainty and the prediction from the regularized model will be better. In consideration of the trade-off between the size and speed, we choose some lightweight models as the backbone of a YOLOv3 detector and the experimental results on PASCAL VOC dataset and MS COCO demonstrate the effectiveness of the proposed approach.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[3]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[5]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[6]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Yarin Gal,et al.  Understanding Measures of Uncertainty for Adversarial Example Detection , 2018, UAI.

[8]  Xiaochun Cao,et al.  Transferable Adversarial Attacks for Image and Video Object Detection , 2018, IJCAI.

[9]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[10]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[11]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[15]  Hyung-Il Kim,et al.  Localization Uncertainty Estimation for Anchor-Free Object Detection , 2020, ECCV Workshops.

[16]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[17]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[18]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[19]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[20]  Larry S. Davis,et al.  Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[23]  Xiangyu Zhang,et al.  Bounding Box Regression With Uncertainty for Accurate Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Kumar Shridhar,et al.  Uncertainty Estimations by Softplus normalization in Bayesian Convolutional Neural Networks with Variational Inference , 2018 .

[25]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[26]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[27]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[28]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[29]  Bingbing Ni,et al.  Scale-Transferrable Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Matthias Rottmann,et al.  MetaDetect: Uncertainty Quantification and Prediction Quality Estimates for Object Detection , 2020, 2021 International Joint Conference on Neural Networks (IJCNN).

[31]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[33]  Sparsh Mittal,et al.  ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[34]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[35]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[36]  Shuicheng Yan,et al.  Dual Path Networks , 2017, NIPS.

[37]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[40]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Philip Bachman,et al.  Learning with Pseudo-Ensembles , 2014, NIPS.

[42]  Shyh Yaw Jou,et al.  A Novel lightweight Convolutional Neural Network, ExquisiteNetV2 , 2021, ArXiv.

[43]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[49]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Jiang Deng,et al.  A Gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets , 2019, Int. J. Mach. Learn. Cybern..

[51]  Yan Li,et al.  Loss Rescaling by Uncertainty Inference For Single-Stage Object Detection , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[52]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[53]  Qing Yang,et al.  Cascaded Context Dependency: An Extremely Lightweight Module For Deep Convolutional Neural Networks , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[54]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55]  Kyungjae Lee,et al.  Uncertainty-Aware Learning from Demonstration Using Mixture Density Networks with Sampling-Free Variance Modeling , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[56]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[58]  Alan L. Yuille,et al.  Adversarial Examples for Semantic Segmentation and Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[60]  Muhammad Younus Javed,et al.  Multi-level features fusion and selection for human gait recognition: an optimized framework of Bayesian model and binomial distribution , 2019, Int. J. Mach. Learn. Cybern..

[61]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[62]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.