Multi-View Reprojection Architecture for Orientation Estimation

In autonomous driving scenarios, pose estimation of surrounding vehicles and other objects is required in decision making and planning. This paper proposes Multi-View Reprojection Architecture, a flexible, highly accurate architecture to adopt any 2D detection network and extend it to regress the orientation of object's 3D bounding box and its dimensions. In contrast to previous techniques, our network incorporates geometric constraints on 3D box imposed by 2D detection box. In particular, we regress dimensions, orientation and reprojected boxes in multi-view obtained from novel 3D reconstruction layer using perspective geometry. In 3D reconstruction layer, we use an iterative refinement strategy to accurately recover 3D boxes even when 2D boxes are truncated. The proposed architecture is shown to outperform state-of-the-art methods on the challenging KITTI car orientation benchmark and obtain top results on 3D detection benchmark while running in real time, making it suitable for autonomous vehicles.

[1]  Xiaogang Wang,et al.  PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Yan Wang,et al.  Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  C. Qi Deep Learning on Point Sets for 3 D Classification and Segmentation , 2016 .

[5]  Silvio Savarese,et al.  Subcategory-Aware Convolutional Neural Networks for Object Proposals and Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Steven L. Waslander,et al.  Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Bin Xu,et al.  Multi-level Fusion Based 3D Object Detection from Monocular Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Yan Lu,et al.  MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization , 2018, AAAI.

[9]  Jae Wook Jeon,et al.  Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks , 2017, Signal Process. Image Commun..

[10]  Adrien Gaidon,et al.  ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Marius Leordeanu,et al.  Shift R-CNN: Deep Monocular 3D Object Detection With Closed-Form Geometric Constraints , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[12]  Jianping An,et al.  Voxel-FPN: multi-scale voxel feature aggregation in 3D object detection from point clouds , 2019, ArXiv.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Silvio Savarese,et al.  Data-driven 3D Voxel Patterns for object category recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[16]  Pascal Fua,et al.  Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Xiaogang Wang,et al.  GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  James M. Rehg,et al.  3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Jana Kosecka,et al.  3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xiaoming Liu,et al.  M3D-RPN: Monocular 3D Region Proposal Network for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Roberto Cipolla,et al.  Orthographic Feature Transform for Monocular 3D Object Detection , 2018, BMVC.

[24]  Konrad Schindler,et al.  Are Cars Just 3D Boxes? Jointly Estimating the 3D Shape of Multiple Objects , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Kris Kitani,et al.  Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[27]  Christopher Zach,et al.  Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss , 2019, ArXiv.

[28]  Andrea Simonelli,et al.  Disentangling Monocular 3D Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Zhixin Wang,et al.  Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Thierry Chateau,et al.  Deep MANTA: A Coarse-to-Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis from Monocular Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Haojie Li,et al.  Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Stefano Soatto,et al.  Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors , 2019, AAAI.