DirectShape: Photometric Alignment of Shape Priors for Visual Vehicle Pose and Shape Estimation

3D scene understanding from images is a challenging problem which is encountered in robotics, augmented reality and autonomous driving scenarios. In this paper, we propose a novel approach to jointly infer the 3D rigid-body poses and shapes of vehicles from stereo images of road scenes. Unlike previous work that relies on geometric alignment of shapes with dense stereo reconstructions, our approach works directly on images and infers shape and pose efficiently through combined photometric and silhouette alignment of 3D shape priors with a stereo image. We use a shape prior that represents cars in a low-dimensional linear embedding of volumetric signed distance functions. For efficiently measuring the consistency with both alignment terms, we propose an adaptive sparse point selection scheme. In experiments, we demonstrate superior performance of our method in pose estimation and shape reconstruction over a state-of-the-art approach that uses geometric alignment with dense stereo reconstructions. Our approach can also boost the performance of deep-learning based approaches to 3D object detection as a refinement method. We demonstrate that our method significantly improves accuracy for several recent detection approaches.

[1]  Daniel Cremers,et al.  Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Anthony J. Yezzi,et al.  A Nonrigid Kernel-Based Framework for 2D-3D Pose Estimation and 2D Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yan Wang,et al.  Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ian D. Reid,et al.  Nonlinear shape manifolds as shape priors in level set segmentation and tracking , 2011, CVPR 2011.

[7]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Huimin Ma,et al.  3D Object Proposals for Accurate Object Class Detection , 2015, NIPS.

[9]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Daniel Cremers,et al.  LDSO: Direct Sparse Odometry with Loop Closure , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Yogesh Rathi,et al.  A Framework for Image Segmentation Using Shape Models and Kernel Space Shape Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Daniel Cremers,et al.  Semi-dense Visual Odometry for a Monocular Camera , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Shaojie Shen,et al.  Stereo R-CNN Based 3D Object Detection for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Andreas Geiger,et al.  Joint 3D Object and Layout Inference from a Single RGB-D Image , 2015, GCPR.

[16]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[17]  Jana Kosecka,et al.  3D Bounding Box Estimation Using Deep Learning and Geometry , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Martial Hebert,et al.  3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Simon Lucey,et al.  Object-Centric Photometric Bundle Adjustment with Deep Shape Prior , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[20]  George Drettakis,et al.  Automatic 3D Car Model Alignment for Mixed Image-Based Rendering , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[21]  Jianxiong Xiao,et al.  Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[22]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Bin Xu,et al.  Multi-level Fusion Based 3D Object Detection from Monocular Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Ian D. Reid,et al.  Dense Reconstruction Using 3D Object Shape Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Jörg Stückler,et al.  Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Niloy J. Mitra,et al.  Object Proposals Estimation in Depth Image Using Compact 3D Shape Manifolds , 2015, GCPR.

[30]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[31]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jörg Stückler,et al.  Joint Object Pose Estimation and Shape Reconstruction in Urban Street Scenes Using 3D Shape Priors , 2016, GCPR.

[33]  Ian D. Reid,et al.  Shared shape spaces , 2011, 2011 International Conference on Computer Vision.

[34]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Andreas Geiger,et al.  Efficient Large-Scale Stereo Matching , 2010, ACCV.

[36]  Ian D. Reid,et al.  Simultaneous Monocular 2D Segmentation, 3D Pose Recovery and 3D Reconstruction , 2012, ACCV.

[37]  Jörg Stückler,et al.  SAMP: Shape and Motion Priors for 4D Vehicle Reconstruction , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[38]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Jörg Stückler,et al.  Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry , 2018, ECCV.

[40]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[43]  Andreas Geiger,et al.  Exploiting Object Similarity in 3D Reconstruction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  James M. Rehg,et al.  3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.