Learning 3D Shape Completion from Laser Scan Data with Weak Supervision

3D shape completion from partial point clouds is a fundamental problem in computer vision and computer graphics. Recent approaches can be characterized as either data-driven or learning-based. Data-driven approaches rely on a shape model whose parameters are optimized to fit the observations. Learning-based approaches, in contrast, avoid the expensive optimization step and instead directly predict the complete shape from the incomplete observations using deep neural networks. However, full supervision is required which is often not available in practice. In this work, we propose a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision. While we also learn a shape prior on synthetic data, we amortize, i.e., learn, maximum likelihood fitting using deep neural networks resulting in efficient shape completion without sacrificing accuracy. Tackling 3D shape completion of cars on ShapeNet [5] and KITTI [18], we demonstrate that the proposed amortized maximum likelihood approach is able to compete with a fully supervised baseline and a state-of-the-art data-driven approach while being significantly faster. On ModelNet [49], we additionally show that the approach is able to generalize to other object categories as well.

[1]  Noah D. Goodman,et al.  Deep Amortized Inference for Probabilistic Programs , 2016, ArXiv.

[2]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Ian D. Reid,et al.  Dense Reconstruction Using 3D Object Shape Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  David Meger,et al.  Improved Adversarial Systems for 3D Object Generation and Reconstruction , 2017, CoRL.

[5]  Andreas Geiger,et al.  Learning 3D Shape Completion Under Weak Supervision , 2018, International Journal of Computer Vision.

[6]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Zygmunt Pizlo,et al.  3D Shape - Its Unique Place in Visual Perception , 2008 .

[8]  Dilin Wang,et al.  Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning , 2016, ArXiv.

[9]  Noah D. Goodman,et al.  Amortized Inference in Probabilistic Reasoning , 2014, CogSci.

[10]  Jörg Stückler,et al.  Joint Object Pose Estimation and Shape Reconstruction in Urban Street Scenes Using 3D Shape Priors , 2016, GCPR.

[11]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[12]  Ke Xie,et al.  A search-classify approach for cluttered indoor scene understanding , 2012, ACM Trans. Graph..

[13]  Andreas Geiger,et al.  Displets: Resolving stereo ambiguities using object knowledge , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Leonidas J. Guibas,et al.  Discovering structural regularity in 3D geometry , 2008, SIGGRAPH 2008.

[15]  Theodore Lim,et al.  Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[16]  Thomas Brox,et al.  Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Daniel G. Aliaga,et al.  Single viewpoint model completion of symmetric objects for digital inspection , 2011, Comput. Vis. Image Underst..

[18]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[20]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[21]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[22]  Simon J. Julier,et al.  Structured Prediction of Unobserved Voxels from a Single Depth Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ming-Ting Sun,et al.  Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Oliver Kroemer,et al.  Point cloud completion using extrusions , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[25]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[26]  Daniel Cremers,et al.  Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[28]  Jitendra Malik,et al.  Aligning 3D models to RGB-D images of cluttered scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Max Jaderberg,et al.  Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[30]  Zhen Li,et al.  High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Oliver Grau,et al.  VConv-DAE: Deep Volumetric Shape Learning Without Object Labels , 2016, ECCV Workshops.

[32]  Leonidas J. Guibas,et al.  Example-Based 3D Scan Completion , 2005 .

[33]  Paolo Cignoni,et al.  MeshLab: an Open-Source Mesh Processing Tool , 2008, Eurographics Italian Chapter Conference.

[34]  Leonidas J. Guibas,et al.  Database‐Assisted Object Retrieval for Real‐Time 3D Reconstruction , 2015, Comput. Graph. Forum.

[35]  Peter Shirley,et al.  An Efficient and Robust Ray-Box Intersection Algorithm , 2005, J. Graph. Tools.

[36]  Sebastian Thrun,et al.  Shape from symmetry , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[37]  Horst Bischof,et al.  OctNetFusion: Learning Depth Fusion from Data , 2017, 2017 International Conference on 3D Vision (3DV).

[38]  Carlos Hernandez,et al.  Multi-View Stereo: A Tutorial , 2015, Found. Trends Comput. Graph. Vis..

[39]  Jörg Stückler,et al.  SAMP: Shape and Motion Priors for 4D Vehicle Reconstruction , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[40]  Tomas Akenine-Möller,et al.  Fast, minimum storage ray/triangle intersection , 1997, J. Graphics, GPU, & Game Tools.

[41]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[42]  Henrik Aanæs,et al.  Large Scale Multi-view Stereopsis Evaluation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Bo Yang,et al.  3D Object Dense Reconstruction from a Single Depth View , 2018, ArXiv.

[45]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[46]  Sanja Fidler,et al.  3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Matthias Nießner,et al.  Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[49]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[50]  Leonidas J. Guibas,et al.  Data-driven structural priors for shape completion , 2015, ACM Trans. Graph..

[51]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[52]  Jitendra Malik,et al.  Hierarchical Surface Prediction for 3D Object Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[53]  Zygmunt Pizlo,et al.  Human Perception of 3D Shapes , 2007, CAIP.

[54]  Niloy J. Mitra,et al.  Non-local scan consolidation for 3D urban scenes , 2010, SIGGRAPH 2010.

[55]  Silvio Savarese,et al.  Dense Object Reconstruction with Semantic Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[57]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.