Deep Mixture Density Network for Probabilistic Object Detection

Mistakes/uncertainties in object detection could lead to catastrophes when deploying robots in the real world. In this paper, we measure the uncertainties of object localization to minimize this kind of risk. Uncertainties emerge upon challenging cases like occlusion. The bounding box borders of an occluded object can have multiple plausible configurations. We propose a deep multivariate mixture of Gaussians model for probabilistic object detection. The covariances help to learn the relationship between the borders, and the mixture components potentially learn different configurations of an occluded part. Quantitatively, our model improves the AP of the baselines by 3.9% and 1.4% on CrowdHuman and MS-COCO respectively with almost no computational or memory overhead. Qualitatively, our model enjoys explainability since the resulting covariance matrices and the mixture components help measure uncertainties.

[1]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[2]  Nanning Zheng,et al.  Single Image Super-resolution via a Lightweight Residual Convolutional Neural Network , 2017 .

[3]  Jianren Wang,et al.  Physics-Aware 3D Mesh Synthesis , 2019, 2019 International Conference on 3D Vision (3DV).

[4]  Shifeng Zhang,et al.  Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd , 2018, ECCV.

[5]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Hyuk-Jae Lee,et al.  Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Zhiqiang Shen,et al.  MoBiNet: A Mobile Binary Network for Image Classification , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  Alan Yuille,et al.  Detecting Semantic Parts on Partially Occluded Objects , 2017, BMVC.

[9]  C. Bishop Mixture density networks , 1994 .

[10]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[11]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[12]  Larry S. Davis,et al.  Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Katerina Fragkiadaki,et al.  Epipolar Transformers , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yuchun Ma,et al.  AddressNet: Shift-Based Primitives for Efficient Convolutional Neural Networks , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16]  Xiangyu Zhang,et al.  Bounding Box Regression With Uncertainty for Accurate Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yuandong Tian,et al.  Semantic Amodal Segmentation , 2015, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jianren Wang,et al.  Prediction-Tracking-Segmentation , 2019, ArXiv.

[19]  Alan L. Yuille,et al.  DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection Under Partial Occlusion , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Jian Yang,et al.  Occluded Pedestrian Detection Through Guided Attention in CNNs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Kush R. Varshney,et al.  On the Safety of Machine Learning: Cyber-Physical Systems, Decision Sciences, and Data Products , 2016, Big Data.

[22]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[23]  Michael Milford,et al.  Evaluating Merging Strategies for Sampling-based Uncertainty Techniques in Object Detection , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[24]  Stephen Lin,et al.  Deformable ConvNets V2: More Deformable, Better Results , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[26]  Larry S. Davis,et al.  SNIPER: Efficient Multi-Scale Training , 2018, NeurIPS.

[27]  Marios Savvides,et al.  Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[29]  Gustavo Carneiro,et al.  Probabilistic Object Detection: Definition and Evaluation , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[30]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jitendra Malik,et al.  Amodal Instance Segmentation , 2016, ECCV.

[32]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[33]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[34]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Jitendra Malik,et al.  More Than a Feeling: Learning to Grasp and Regrasp Using Vision and Touch , 2018, IEEE Robotics and Automation Letters.

[37]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[38]  Jianren Wang,et al.  AlignNet: A Unifying Approach to Audio-Visual Alignment , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[39]  Sinan Kalkan,et al.  Localization Recall Precision (LRP): A New Performance Metric for Object Detection , 2018, ECCV.

[40]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Fernando A. Mujica,et al.  An Empirical Evaluation of Deep Learning on Highway Driving , 2015, ArXiv.

[42]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Xiangyu Zhang,et al.  CrowdHuman: A Benchmark for Detecting Human in a Crowd , 2018, ArXiv.