Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots

Accurate 3D object detection in LiDAR based point clouds suffers from the challenges of data sparsity and irregularities. Existing methods strive to organize the points regularly, e.g. voxelize, pass them through a designed 2D/3D neural network, and then define object-level anchors that predict offsets of 3D bounding boxes using collective evidences from all the points on the objects of interest. Contrary to the state-of-the-art anchor-based methods, based on the very nature of data sparsity, we observe that even points on an individual object part are informative about semantic information of the object. We thus argue in this paper for an approach opposite to existing methods using object-level anchors. Inspired by compositional models, which represent an object as parts and their spatial relations, we propose to represent an object as composition of its interior non-empty voxels, termed hotspots, and the spatial relations of hotspots. This gives rise to the representation of Object as Hotspots (OHS). Based on OHS, we further propose an anchor-free detection head with a novel ground truth assignment strategy that deals with inter-object point-sparsity imbalance to prevent the network from biasing towards objects with more points. Experimental results show that our proposed method works remarkably well on objects with a small number of points. Notably, our approach ranked 1st on KITTI 3D Detection Benchmark for cyclist and pedestrian detection, and achieved state-of-the-art performance on NuScenes 3D Detection Benchmark.

[1]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Yuning Jiang,et al.  FoveaBox: Beyound Anchor-Based Object Detection , 2019, IEEE Transactions on Image Processing.

[3]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[4]  Nicholay Topin,et al.  Super-convergence: very fast training of neural networks using large learning rates , 2018, Defense + Commercial Sensing.

[5]  Stuart Geman,et al.  Context and Hierarchy in a Probabilistic Image Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[7]  Laurens van der Maaten,et al.  3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Philipp Krähenbühl,et al.  Center-based 3D Object Detection and Tracking , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jianping An,et al.  Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds , 2019, ArXiv.

[10]  Bin Yang,et al.  PIXOR: Real-time 3D Object Detection from Point Clouds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Horst-Michael Groß,et al.  Complex-YOLO: An Euler-Region-Proposal for Real-Time 3D Object Detection on Point Clouds , 2018, ECCV Workshops.

[13]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Frank Hutter,et al.  Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[17]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[18]  Mohamed Zahran,et al.  YOLO4D: A Spatio-temporal Approach for Real-time Multi-object Detection and Classification from LiDAR Point Clouds , 2018 .

[19]  Bo Li,et al.  SECOND: Sparsely Embedded Convolutional Detection , 2018, Sensors.

[20]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Benjin Zhu,et al.  Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection , 2019, ArXiv.

[23]  Adam Kortylewski,et al.  Greedy Structure Learning of Hierarchical Compositional Models , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Steven Lake Waslander,et al.  Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[26]  D. Ramanan,et al.  What You See is What You Get: Exploiting Visibility for 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Xiaogang Wang,et al.  PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[30]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Bin Yang,et al.  Deep Continuous Fusion for Multi-sensor 3D Object Detection , 2018, ECCV.

[32]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[33]  Xiaoli Hao,et al.  SARPNET: Shape attention regional proposal network for liDAR-based 3D object detection , 2020, Neurocomputing.

[34]  Bin Yang,et al.  Multi-Task Multi-Sensor Fusion for 3D Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Sanja Fidler,et al.  Learning a Hierarchical Compositional Shape Vocabulary for Multi-class Object Representation , 2014, ArXiv.

[36]  Marios Savvides,et al.  Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Ruigang Yang,et al.  IoU Loss for 2D/3D Object Detection , 2019, 2019 International Conference on 3D Vision (3DV).

[39]  Alan L. Yuille,et al.  DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection Under Partial Occlusion , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Ulrich Neumann,et al.  SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Wenze Hu,et al.  Unsupervised Learning of Dictionaries of Hierarchical Compositional Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Xingyi Zhou,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Bo Yang,et al.  Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds , 2019, NeurIPS.

[45]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[46]  Zhixin Wang,et al.  Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[47]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Long Zhu,et al.  Unsupervised Structure Learning: Hierarchical Recursive Composition, Suspicious Coincidence and Competitive Exclusion , 2008, ECCV.

[50]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Ahmad El Sallab,et al.  YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud , 2018, ECCV Workshops.

[52]  Leonidas J. Guibas,et al.  Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53]  Carlos Vallespi-Gonzalez,et al.  LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Bin Chen,et al.  Dual Refinement Feature Pyramid Networks for Object Detection , 2020, ArXiv.

[55]  Xiaoyong Shen,et al.  STD: Sparse-to-Dense 3D Object Detector for Point Cloud , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[56]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Song Han,et al.  Point-Voxel CNN for Efficient 3D Deep Learning , 2019, NeurIPS.

[58]  Jiaya Jia,et al.  Fast Point R-CNN , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).