Multiview Aggregation for Learning Category-Specific Shape Reconstruction

We investigate the problem of learning category-specific 3D shape reconstruction from a variable number of RGB views of previously unobserved object instances. Most approaches for multiview shape reconstruction operate on sparse shape representations, or assume a fixed number of views. We present a method that can estimate dense 3D shape, and aggregate shape across multiple and varying number of input views. Given a single input view of an object instance, we propose a representation that encodes the dense shape of the visible object surface as well as the surface behind line of sight occluded by the visible surface. When multiple input views are available, the shape representation is designed to be aggregated into a single 3D shape using an inexpensive union operation. We train a 2D CNN to learn to predict this representation from a variable number of views (1 or more). We further aggregate multiview information by using permutation equivariant layers that promote order-agnostic view information exchange at the feature level. Experiments show that our approach is able to produce dense 3D reconstructions of objects that improve in quality as more views are added.

[1]  Xi Zhou,et al.  Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network , 2018, ECCV.

[2]  Jitendra Malik,et al.  Learning a Multi-View Stereo Machine , 2017, NIPS.

[3]  Alex Pentland,et al.  Shape Information From Shading: A Theory About Human Perception , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[4]  Stefan Roth,et al.  Matryoshka Networks: Predicting 3D Geometry via Nested Shape Layers , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Jitendra Malik,et al.  Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[6]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Cass W. Everitt,et al.  Interactive Order-Independent Transparency , 2001 .

[8]  Barnabás Póczos,et al.  Equivariance Through Parameter-Sharing , 2017, ICML.

[9]  Didier Stricker,et al.  Learning 3D Shapes as Multi-Layered Height-maps using 2D Convolutional Networks , 2018, ECCV.

[10]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[11]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Chen Kong,et al.  Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction , 2017, AAAI.

[13]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[14]  Andrew W. Fitzgibbon,et al.  Exploiting uncertainty in regression forests for accurate camera relocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Lu Fang,et al.  SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[17]  Andrew W. Fitzgibbon,et al.  What Shape Are Dolphins? Building 3D Morphable Models from 2D Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jiajun Wu,et al.  MarrNet: 3D Shape Reconstruction via 2.5D Sketches , 2017, NIPS.

[19]  Jitendra Malik,et al.  Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[21]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[22]  Narendra Ahuja,et al.  DeepMVS: Learning Multi-view Stereopsis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Stefan Leutenegger,et al.  X-Section: Cross-Section Prediction for Enhanced RGB-D Fusion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Alexey Dosovitskiy,et al.  Unsupervised Learning of Shape and Pose with Differentiable Point Clouds , 2018, NeurIPS.

[25]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Yaron Lipman,et al.  Invariant and Equivariant Graph Networks , 2018, ICLR.

[27]  Andrew Zisserman,et al.  SilNet : Single- and Multi-View Reconstruction by Learning from Silhouettes , 2017, BMVC.

[28]  Barnabás Póczos,et al.  Deep Learning with Sets and Point Clouds , 2016, ICLR.

[29]  Derek Hoiem,et al.  Pixels, Voxels, and Views: A Study of Shape Representations for Single View 3D Object Shape Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Abd El Rahman Shabayek,et al.  A survey on Deep Learning Advances on Different 3D Data Representations , 2018, 1808.01462.

[31]  Leonidas J. Guibas,et al.  Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Steven J. Gortler,et al.  Geometry images , 2002, SIGGRAPH.

[33]  Simon Lucey,et al.  Object-Centric Photometric Bundle Adjustment with Deep Shape Prior , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[34]  Vittorio Ferrari,et al.  Learning Single-Image 3D Reconstruction by Generative Modelling of Shape, Pose and Shading , 2019, International Journal of Computer Vision.

[35]  Mathieu Aubry,et al.  AtlasNet: A Papier-M\^ach\'e Approach to Learning 3D Surface Generation , 2018, CVPR 2018.

[36]  Jiajun Wu,et al.  Learning to Reconstruct Shapes from Unseen Classes , 2018, NeurIPS.

[37]  Frédo Durand,et al.  Burst Image Deblurring Using Permutation Invariant Convolutional Neural Networks , 2018, ECCV.

[38]  Lourdes Agapito,et al.  Reconstructing PASCAL VOC , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[40]  Silvio Savarese,et al.  Weakly Supervised 3D Reconstruction with Adversarial Constraint , 2017, 2017 International Conference on 3D Vision (3DV).

[41]  Jitendra Malik,et al.  Category-specific object reconstruction from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Kostas Daniilidis,et al.  Equivariant Multi-View Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Subhajit Sanyal,et al.  Multilevel modelling and rendering of architectural scenes , 2003, Eurographics.

[44]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[45]  Richard Szeliski,et al.  Layered depth images , 1998, SIGGRAPH.

[46]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[47]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.