论文信息 - Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency

Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency

. Approaches to single-view reconstruction typically rely on viewpoint annotations, silhouettes, the absence of background, multiple views of the same instance, a template shape, or symmetry. We avoid all of these supervisions and hypotheses by leveraging explicitly the consistency between images of diﬀerent object instances. As a result, our method can learn from large collections of unlabelled images depicting the same object category. Our main contributions are two approaches to leverage cross-instance consistency: (i) progressive conditioning , a training strategy to gradually specialize the model from category to instances in a curriculum learning fashion; (ii) swap reconstruction , a loss enforcing consistency between instances having similar shape or texture. Critical to the success of our method are also: our structured autoencoding architecture decomposing an image into explicit shape, texture, pose, and background; an adapted formulation of diﬀerential rendering, and; a new optimization scheme alternating between 3D and pose learning. We compare our approach, UNICORN, both on the diverse synthetic ShapeNet dataset - the classical benchmark for methods requiring multiple views as supervision - and on standard real-image benchmarks (Pascal3D+ Car, CUB-200) for which most methods require known templates and silhouette annotations. We also showcase applicability to more challenging real-world collections (CompCars, LSUN), where silhouettes are not available and images are not cropped around the object. demonstrate high-quality and realistic textured 3D reconstructions.

Alexei A. Efros | Mathieu Aubry | Matthew Fisher | Tom Monnier

[1] Deva Ramanan,et al. NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild , 2021, NeurIPS.

[2] Jean Ponce,et al. Unsupervised Layered Image Decomposition into Object Prototypes , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3] A. Makadia,et al. De-rendering the World’s Revolutionary Artefacts , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Andreas Geiger,et al. GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] A. Torralba,et al. Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering , 2020, ICLR.

[6] Jitendra Malik,et al. Shape and Viewpoint without Keypoints , 2020, ECCV.

[7] Abhinav Gupta,et al. Implicit Mesh Reconstruction from Unannotated Image Collections , 2020, ArXiv.

[8] Adrien Gaidon,et al. Differentiable Rendering: A Survey , 2020, ArXiv.

[9] Mathieu Aubry,et al. Deep Transformation-Invariant Clustering , 2020, NeurIPS.

[10] Christoph H. Lampert,et al. Leveraging 2D Data to Learn Textured 3D Mesh Generation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Jan Kautz,et al. Self-supervised Single-view 3D Reconstruction via Semantic Consistency , 2020, ECCV.

[12] Andreas Geiger,et al. Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] A. Vedaldi,et al. Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild , 2019, Computer Vision and Pattern Recognition.

[14] Wan-Yen Lo,et al. Accelerating 3D deep learning with PyTorch3D , 2019, SIGGRAPH Asia 2020 Courses.

[15] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[16] S. Fidler,et al. Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer , 2019, NeurIPS.

[17] Duygu Ceylan,et al. DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction , 2019, NeurIPS.

[18] Hao Li,et al. Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19] Yong-Liang Yang,et al. HoloGAN: Unsupervised Learning of 3D Representations From Natural Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[20] Vittorio Ferrari,et al. Learning Single-Image 3D Reconstruction by Generative Modelling of Shape, Pose and Shading , 2019, International Journal of Computer Vision.

[21] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Alexey Dosovitskiy,et al. Unsupervised Learning of Shape and Pose with Differentiable Point Clouds , 2018, NeurIPS.

[24] Wei Liu,et al. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[25] Jitendra Malik,et al. Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[26] Mathieu Aubry,et al. AtlasNet: A Papier-M\^ach\'e Approach to Learning 3D Surface Generation , 2018, CVPR 2018.

[27] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28] Jitendra Malik,et al. Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Tatsuya Harada,et al. Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30] Alexei A. Efros,et al. Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Thomas Brox,et al. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Honglak Lee,et al. Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[33] Silvio Savarese,et al. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[34] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[36] Yinda Zhang,et al. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[37] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[38] Xiaoou Tang,et al. A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Jitendra Malik,et al. Category-specific object reconstruction from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[42] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[43] Michael J. Black,et al. OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[44] Lourdes Agapito,et al. Reconstructing PASCAL VOC , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45] Silvio Savarese,et al. Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[46] Pietro Perona,et al. Caltech-UCSD Birds 200 , 2010 .

[47] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[48] Marc Alexa,et al. Laplacian mesh optimization , 2006, GRAPHITE '06.

[49] Alexei A. Efros,et al. Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[50] Alexei A. Efros,et al. Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[51] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[52] Brendan J. Frey,et al. Learning flexible sprites in video layers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[53] Mark Meyer,et al. Implicit fairing of irregular meshes using diffusion and curvature flow , 1999, SIGGRAPH.

[54] Stanley Finger,et al. Origins of neuroscience: A history of explorations into brain function. , 1994 .

[55] J. Elman. Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[56] Paul J. Besl,et al. A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..