MonoCInIS: Camera Independent Monocular 3D Object Detection using Instance Segmentation

Monocular 3D object detection has recently shown promising results, however there remain challenging problems. One of those is the lack of invariance to different camera intrinsic parameters, which can be observed across different 3D object datasets. Little effort has been made to exploit the combination of heterogeneous 3D object datasets. In contrast to general intuition, we show that more data does not automatically guarantee a better performance, but rather, methods need to have a degree of ’camera independence’ in order to benefit from large and heterogeneous training data. In this paper we propose a category-level pose estimation method based on instance segmentation, using camera independent geometric reasoning to cope with the varying camera viewpoints and intrinsics of different datasets. Every pixel of an instance predicts the object dimensions, the 3D object reference points projected in 2D image space and, optionally, the local viewing angle. Camera intrinsics are only used outside of the learned network to lift the predicted 2D reference points to 3D. We surpass camera independent methods on the challenging KITTI3D benchmark and show the key benefits compared to camera dependent methods.

[1]  Jana Kosecka,et al.  3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Evgeny Burnaev,et al.  Monocular 3D Object Detection via Geometric Reasoning on Keypoints , 2019, VISIGRAPP.

[3]  Steven L. Waslander,et al.  Categorical Depth Distribution Network for Monocular 3D Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Siddharth Srivastava,et al.  Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Adrien Gaidon,et al.  ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ming Liu,et al.  Ground-Aware Monocular 3D Object Detection for Autonomous Driving , 2021, IEEE Robotics and Automation Letters.

[7]  Gaurav Sharma,et al.  Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Yoonsuk Hyun,et al.  Multi-View Reprojection Architecture for Orientation Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[9]  Haojie Li,et al.  Delving into Localization Errors for Monocular 3D Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Hujun Bao,et al.  PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ian D. Reid,et al.  LieNet: Real-time Monocular Object Instance 6D Pose Estimation , 2018, BMVC.

[12]  Xiaoyong Shen,et al.  DSGN: Deep Stereo Geometry Network for 3D Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Stefan Milz,et al.  WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xiangyang Ji,et al.  CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Yan Lu,et al.  MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization , 2018, AAAI.

[18]  Lei Zhang,et al.  Structure Aware Single-Stage 3D Object Detection From Point Cloud , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Luc Van Gool,et al.  Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[21]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yan Wang,et al.  End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Roberto Cipolla,et al.  Orthographic Feature Transform for Monocular 3D Object Detection , 2018, BMVC.

[24]  Mingyang Li,et al.  MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Christopher Zach,et al.  Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss , 2019, ArXiv.

[27]  Li Wang,et al.  Lite-FPN for Keypoint-based Monocular 3D Object Detection , 2021, ArXiv.

[28]  Jiwen Lu,et al.  Objects are Different: Flexible Monocular 3D Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Mohan M. Trivedi,et al.  Ground Plane Polling for 6DoF Pose Estimation of Objects on the Road , 2020, IEEE Transactions on Intelligent Vehicles.

[30]  Andrea Simonelli,et al.  Towards Generalization Across Depth for Monocular 3D Object Detection , 2020, ECCV.

[31]  Thierry Chateau,et al.  Deep MANTA: A Coarse-to-Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis from Monocular Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Lu Xiong,et al.  MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Haojie Li,et al.  Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yan Wang,et al.  Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Shiyu Song,et al.  Joint SFM and detection cues for monocular 3D localization in road scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Hongyan Liu,et al.  Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview , 2021, ArXiv.

[39]  Stefano Soatto,et al.  Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors , 2019, AAAI.

[40]  Xiaoli Hao,et al.  SARPNET: Shape attention regional proposal network for liDAR-based 3D object detection , 2020, Neurocomputing.

[41]  Yan Wang,et al.  Train in Germany, Test in the USA: Making 3D Object Detectors Generalize , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Simone Gasparini,et al.  Camera Models and Fundamental Concepts Used in Geometric Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[43]  Jonathan Tompson,et al.  Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning , 2018, NeurIPS.

[44]  Zizhang Wu,et al.  SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[45]  Ling Shao,et al.  M3DSSD: Monocular 3D Single Stage Object Detector , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Jiwen Lu,et al.  Deep Fitting Degree Scoring Network for Monocular 3D Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Andreas Geiger,et al.  Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes , 2017, International Journal of Computer Vision.

[48]  Yan Wang,et al.  Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving , 2019, ICLR.

[49]  Xiaoming Liu,et al.  M3D-RPN: Monocular 3D Region Proposal Network for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Marius Leordeanu,et al.  Shift R-CNN: Deep Monocular 3D Object Detection With Closed-Form Geometric Constraints , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[51]  Andrea Simonelli,et al.  Disentangling Monocular 3D Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Kris Kitani,et al.  Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[53]  Xiaogang Wang,et al.  Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation , 2020, AAAI.

[54]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Bin Xu,et al.  Multi-level Fusion Based 3D Object Detection from Monocular Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Husam Abu-Haimed,et al.  Learning Object-Specific Distance From a Monocular Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Huaici Zhao,et al.  RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving , 2020, ECCV.

[58]  Dongsuk Kum,et al.  Deep Learning based Vehicle Position and Orientation Estimation via Inverse Perspective Mapping Image , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[59]  James M. Rehg,et al.  3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  Hongzi Zhu,et al.  Monocular 3D Object Detection: An Extrinsic Parameter Free Approach , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Eduardo Romera,et al.  ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[62]  Wei Zhang,et al.  ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection , 2020, AAAI.

[63]  Shubhra Aich,et al.  RefinedMPL: Refined Monocular PseudoLiDAR for 3D Object Detection in Autonomous Driving , 2019, ArXiv.

[64]  Steven L. Waslander,et al.  Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Xiaowei Zhou,et al.  6-DoF object pose from semantic keypoints , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[66]  Slobodan Ilic,et al.  DPOD: 6D Pose Object Detector and Refiner , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[67]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[68]  Guodong Guo,et al.  Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Eric Brachmann,et al.  iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects , 2017, ACCV.

[70]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Sven Behnke,et al.  ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation , 2019, VISIGRAPP.

[72]  Zhiwu Lu,et al.  Learning Depth-Guided Convolutions for Monocular 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).