MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time

Monocular multi-object detection and localization in 3D space has been proven to be a challenging task. The MoNet3D algorithm is a novel and effective framework that can predict the 3D position of each object in a monocular image and draw a 3D bounding box for each object. The MoNet3D method incorporates prior knowledge of the spatial geometric correlation of neighbouring objects into the deep neural network training process to improve the accuracy of 3D object localization. Experiments on the KITTI dataset show that the accuracy for predicting the depth and horizontal coordinates of objects in 3D space can reach 96.25\% and 94.74\%, respectively. Moreover, the method can realize the real-time image processing at 27.85 FPS, showing promising potential for embedded advanced driving-assistance system applications. Our code is publicly available at this https URL.

[1]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[2]  Jungwon Lee,et al.  Deep Robust Single Image Depth Estimation Neural Network Using Scene Understanding , 2019, CVPR Workshops.

[3]  C. Qi Deep Learning on Point Sets for 3 D Classification and Segmentation , 2016 .

[4]  Kris Kitani,et al.  Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[5]  Steven L. Waslander,et al.  Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Bin Xu,et al.  Multi-level Fusion Based 3D Object Detection from Monocular Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[9]  Sanja Fidler,et al.  3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Yan Lu,et al.  MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization , 2018, AAAI.

[12]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Xiaoming Liu,et al.  M3D-RPN: Monocular 3D Region Proposal Network for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Roberto Cipolla,et al.  Orthographic Feature Transform for Monocular 3D Object Detection , 2018, BMVC.

[15]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Guizhong Liu,et al.  3D Bounding Box Estimation for Autonomous Vehicles by Cascaded Geometric Constraints and Depurated 2D Detections Using 3D Results , 2019, ArXiv.

[17]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[18]  Adrien Gaidon,et al.  ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jiwen Lu,et al.  Deep Fitting Degree Scoring Network for Monocular 3D Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Vincent Lepetit,et al.  Robust 3D Object Tracking from Monocular Images Using Stable Parts , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Xiaogang Wang,et al.  GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Xuming He,et al.  3D Box Proposals From a Single Monocular Image of an Indoor Scene , 2018, AAAI.

[25]  Christopher Zach,et al.  Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss , 2019, ArXiv.

[26]  Zhenzhong Chen,et al.  MonoFENet: Monocular 3D Object Detection With Feature Enhancement Networks , 2019, IEEE Transactions on Image Processing.

[27]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Xiaogang Wang,et al.  PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Tian Xia,et al.  Vehicle Detection from 3D Lidar Using Fully Convolutional Network , 2016, Robotics: Science and Systems.

[30]  Marco Körner,et al.  MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[31]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[32]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jana Kosecka,et al.  3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).