论文信息 - Learning 3D Shape Completion Under Weak Supervision

Learning 3D Shape Completion Under Weak Supervision

We address the problem of 3D shape completion from sparse and noisy point clouds, a fundamental problem in computer vision and robotics. Recent approaches are either data-driven or learning-based: Data-driven approaches rely on a shape model whose parameters are optimized to fit the observations; Learning-based approaches, in contrast, avoid the expensive optimization step by learning to directly predict complete shapes from incomplete observations in a fully-supervised setting. However, full supervision is often not available in practice. In this work, we propose a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision. While we also learn a shape prior on synthetic data, we amortize, i.e., learn , maximum likelihood fitting using deep neural networks resulting in efficient shape completion without sacrificing accuracy. On synthetic benchmarks based on ShapeNet (Chang et al. Shapenet: an information-rich 3d model repository, 2015 . arXiv:1512.03012 ) and ModelNet (Wu et al., in: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), 2015 ) as well as on real robotics data from KITTI (Geiger et al., in: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), 2012 ) and Kinect (Yang et al., 3d object dense reconstruction from a single depth view, 2018 . arXiv:1802.00411 ), we demonstrate that the proposed amortized maximum likelihood approach is able to compete with the fully supervised baseline of Dai et al. (in: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), 2017 ) and outperforms the data-driven approach of Engelmann et al. (in: Proceedings of the German conference on pattern recognition (GCPR), 2016 ), while requiring less supervision and being significantly faster.

Andreas Geiger | David Stutz | Andreas Geiger | David Stutz

[1] Eric Jones,et al. SciPy: Open Source Scientific Tools for Python , 2001 .

[2] Noah D. Goodman,et al. Amortized Inference in Probabilistic Reasoning , 2014, CogSci.

[3] Abhinav Gupta,et al. Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[4] Carlos Hernandez,et al. Multi-View Stereo: A Tutorial , 2015, Found. Trends Comput. Graph. Vis..

[5] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.

[6] Jörg Stückler,et al. SAMP: Shape and Motion Priors for 4D Vehicle Reconstruction , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7] C. Lee Giles,et al. Learning a Hierarchical Latent-Variable Model of Voxelized 3D Shapes , 2017, ArXiv.

[8] Honglak Lee,et al. Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[9] Christopher K. I. Williams,et al. The shape variational autoencoder: A deep generative model of part‐segmented 3D objects , 2017, Comput. Graph. Forum.

[10] Jitendra Malik,et al. Category-specific object reconstruction from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Oliver Kroemer,et al. Point cloud completion using extrusions , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[12] M. Abramowitz,et al. Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[13] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[14] Jiajun Wu,et al. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[15] Stefan Leutenegger,et al. ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[16] Paolo Cignoni,et al. MeshLab: an Open-Source Mesh Processing Tool , 2008, Eurographics Italian Chapter Conference.

[17] Ian D. Reid,et al. Nonlinear shape manifolds as shape priors in level set segmentation and tracking , 2011, CVPR 2011.

[18] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[19] Sebastian Thrun,et al. Shape from symmetry , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20] Horst Bischof,et al. OctNetFusion: Learning Depth Fusion from Data , 2017, 2017 International Conference on 3D Vision (3DV).

[21] Min Bai,et al. TorontoCity: Seeing the World with a Million Eyes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[22] Lu Ma,et al. Unsupervised Dense Object Discovery, Detection, Tracking and Reconstruction , 2014, ECCV.

[23] Bo Yang,et al. 3D Object Dense Reconstruction from a Single Depth View , 2018, ArXiv.

[24] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Ian D. Reid,et al. Dense Reconstruction Using 3D Object Shape Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Silvio Savarese,et al. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[27] Duc Thanh Nguyen,et al. A Field Model for Repairing 3D Shapes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[29] Jitendra Malik,et al. Aligning 3D models to RGB-D images of cluttered scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[32] Leonidas J. Guibas,et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[33] Niloy J. Mitra,et al. Object Proposals Estimation in Depth Image Using Compact 3D Shape Manifolds , 2015, GCPR.

[34] Peter V. Gehler,et al. 3D object class detection in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35] Oliver Grau,et al. VConv-DAE: Deep Volumetric Shape Learning Without Object Labels , 2016, ECCV Workshops.

[36] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[37] Noah D. Goodman,et al. Deep Amortized Inference for Probabilistic Programs , 2016, ArXiv.

[38] Daniel Cremers,et al. Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences , 2013, 2013 IEEE International Conference on Computer Vision.

[39] M. Abramowitz,et al. Handbook of Mathematical Functions, with Formulas, Graphs, and Mathematical Tables , 1966 .

[40] Joseph L. Mundy,et al. Predicting high resolution image edges with a generic, adaptive, 3-D vehicle model , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[41] Max Jaderberg,et al. Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[42] Bo Yang,et al. 3D Object Reconstruction from a Single Depth View with Adversarial Learning , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[43] Ian D. Reid,et al. Simultaneous Monocular 2D Segmentation, 3D Pose Recovery and 3D Reconstruction , 2012, ACCV.

[44] Marc Levoy,et al. A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[45] Henrik Aanæs,et al. Large Scale Multi-view Stereopsis Evaluation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[47] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.

[48] Derek Hoiem,et al. Completing 3D object shape from one depth image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Anthony J. Yezzi,et al. Non-rigid 2D-3D pose estimation and 2D image segmentation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[50] Jianxiong Xiao,et al. Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[51] Marc Pollefeys,et al. Class Specific 3D Object Shape Priors Using Surface Normals , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52] Bernt Schiele,et al. Detailed 3D Representations for Object Recognition and Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53] Ming-Ting Sun,et al. Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Yoshua Bengio,et al. Denoising Criterion for Variational Auto-Encoding Framework , 2015, AAAI.

[55] Chen Kong,et al. Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction , 2017, AAAI.

[56] Alexei A. Efros,et al. Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[57] Simon J. Julier,et al. Structured Prediction of Unobserved Voxels from a Single Depth Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58] William E. Lorensen,et al. Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[59] Jitendra Malik,et al. Hierarchical Surface Prediction for 3D Object Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[60] Gernot Riegler,et al. OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61] Zygmunt Pizlo,et al. Human Perception of 3D Shapes , 2007, CAIP.

[62] Anthony J. Yezzi,et al. A Nonrigid Kernel-Based Framework for 2D-3D Pose Estimation and 2D Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63] Yuandong Tian,et al. Single Image 3D Interpreter Network , 2016, ECCV.

[64] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[65] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[66] Andreas Geiger,et al. Displets: Resolving stereo ambiguities using object knowledge , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67] Alexei A. Efros,et al. Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68] Jitendra Malik,et al. Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[69] Sanja Fidler,et al. 3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70] Matthias Nießner,et al. Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71] Tatsuya Harada,et al. Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[72] Thomas Brox,et al. Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[73] Daniel G. Aliaga,et al. Single viewpoint model completion of symmetric objects for digital inspection , 2011, Comput. Vis. Image Underst..

[74] Leonidas J. Guibas,et al. Example-Based 3D Scan Completion , 2005 .

[75] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76] Silvio Savarese,et al. Dense Object Reconstruction with Semantic Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[77] Andrew W. Fitzgibbon,et al. KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[78] Rob Fergus,et al. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[79] Leonidas J. Guibas,et al. Database‐Assisted Object Retrieval for Real‐Time 3D Reconstruction , 2015, Comput. Graph. Forum.

[80] Nassir Navab,et al. Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[81] Leonidas J. Guibas,et al. Data-driven structural priors for shape completion , 2015, ACM Trans. Graph..

[82] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[83] Jianxiong Xiao,et al. 3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84] E LorensenWilliam,et al. Marching cubes: A high resolution 3D surface construction algorithm , 1987 .

[85] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[86] Daniel Cohen-Or,et al. Non-local scan consolidation for 3D urban scenes , 2010, ACM Trans. Graph..

[87] Chad DeChant,et al. Shape completion enabled robotic grasping , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[88] Paul J. Besl,et al. A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[89] David Meger,et al. Improved Adversarial Systems for 3D Object Generation and Reconstruction , 2017, CoRL.

[90] Andreas Geiger,et al. Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[91] Zygmunt Pizlo,et al. 3D Shape - Its Unique Place in Visual Perception , 2008 .

[92] Jörg Stückler,et al. Joint Object Pose Estimation and Shape Reconstruction in Urban Street Scenes Using 3D Shape Priors , 2016, GCPR.

[93] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[94] Thomas Brox,et al. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[95] TannenbaumAllen,et al. A Nonrigid Kernel-Based Framework for 2D-3D Pose Estimation and 2D Image Segmentation , 2011 .

[96] Silvio Savarese,et al. Weakly Supervised Generative Adversarial Networks for 3D Reconstruction , 2017, ArXiv.

[97] Leonidas J. Guibas,et al. Discovering structural regularity in 3D geometry , 2008, ACM Trans. Graph..

[98] Zhen Li,et al. Title High Resolution Shape Completion Using Deep Neural Networksfor Global Structure and Local Geometry Inference , 2017 .

[99] Ke Xie,et al. A search-classify approach for cluttered indoor scene understanding , 2012, ACM Trans. Graph..

[100] Chao Yang,et al. Shape Inpainting Using 3D Generative Adversarial Network and Recurrent Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[101] Theodore Lim,et al. Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[102] Daniel Cremers,et al. A Review of Geometry Recovery from a Single Image Focusing on Curved Object Reconstruction , 2013, Innovations for Shape Analysis, Models and Algorithms.

[103] Konrad Schindler,et al. Are Cars Just 3D Boxes? Jointly Estimating the 3D Shape of Multiple Objects , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[104] Andreas Krause,et al. Advances in Neural Information Processing Systems (NIPS) , 2014 .

[105] Andreas Geiger,et al. Learning 3D Shape Completion from Laser Scan Data with Weak Supervision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.