FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection

Monocular 3D object detection is an important task for autonomous driving considering its advantage of low cost. It is much more challenging than conventional 2D cases due to its inherent ill-posed property, which is mainly reflected in the lack of depth information. Recent progress on 2D detection offers opportunities to better solving this problem. However, it is non-trivial to make a general adapted 2D detector work in this 3D task. In this paper, we study this problem with a practice built on a fully convolutional single-stage detector and propose a general framework FCOS3D. Specifically, we first transform the commonly defined 7-DoF 3D targets to the image domain and decouple them as 2D and 3D attributes. Then the objects are distributed to different feature levels with consideration of their 2D scales and assigned only according to the projected 3D-center for the training procedure. Furthermore, the center-ness is redefined with a 2D Gaussian distribution based on the 3D-center to fit the 3D target formulation. All of these make this framework simple yet effective, getting rid of any 2D detection or 2D-3D correspondence priors. Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020. Code and models are released at https://github.com/open-mmlab/mmdetection3d.

[1]  Xiaoyong Shen,et al.  STD: Sparse-to-Dense 3D Object Detector for Point Cloud , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Xinge Zhu,et al.  Reconfigurable Voxels: A New Representation for LiDAR-Based Point Clouds , 2020, CoRL.

[3]  Bernt Schiele,et al.  Kinematic 3D Object Detection in Monocular Video , 2020, ECCV.

[4]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Roberto Cipolla,et al.  Orthographic Feature Transform for Monocular 3D Object Detection , 2018, BMVC.

[6]  Bo Li,et al.  SECOND: Sparsely Embedded Convolutional Detection , 2018, Sensors.

[7]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  Bin Xu,et al.  Multi-level Fusion Based 3D Object Detection from Monocular Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Christopher Zach,et al.  Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss , 2019, ArXiv.

[11]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[13]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Yan Wang,et al.  End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yan Wang,et al.  Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xinge Zhu,et al.  Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Zhiwu Lu,et al.  Learning Depth-Guided Convolutions for Monocular 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Jana Kosecka,et al.  3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yi Yang,et al.  DenseBox: Unifying Landmark Localization with End to End Object Detection , 2015, ArXiv.

[22]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[24]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Adrien Gaidon,et al.  ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Benjin Zhu,et al.  Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection , 2019, ArXiv.

[28]  Xiaogang Wang,et al.  PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[32]  Mingyang Li,et al.  MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yan Wang,et al.  Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving , 2019, ICLR.

[34]  Huaici Zhao,et al.  RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving , 2020, ECCV.

[35]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[37]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Xiaoming Liu,et al.  M3D-RPN: Monocular 3D Region Proposal Network for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Andrea Simonelli,et al.  Disentangling Monocular 3D Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Tao Kong,et al.  Task-Aware Monocular Depth Estimation for 3D Object Detection , 2020, AAAI.

[41]  Xinge Zhu,et al.  SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds , 2020, ECCV.

[42]  Xinge Zhu,et al.  Probabilistic and Geometric Depth: Detecting Objects in Perspective , 2021, CoRL.