Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer

Many machine learning models operate on images, but ignore the fact that images are 2D projections formed by 3D geometry interacting with light, in a process called rendering. Enabling ML models to understand image formation might be key for generalization. However, due to an essential rasterization step involving discrete assignment operations, rendering pipelines are non-differentiable and thus largely inaccessible to gradient-based ML techniques. In this paper, we present {\emph DIB-R}, a differentiable rendering framework which allows gradients to be analytically computed for all pixels in an image. Key to our approach is to view foreground rasterization as a weighted interpolation of local properties and background rasterization as a distance-based aggregation of global geometry. Our approach allows for accurate optimization over vertex positions, colors, normals, light directions and texture coordinates through a variety of lighting models. We showcase our approach in two ML applications: single-image 3D object prediction, and 3D textured object generation, both trained using exclusively using 2D supervision. Our project website is: this https URL

[1]  Jaakko Lehtinen,et al.  Differentiable Monte Carlo ray tracing through edge sampling , 2018, ACM Trans. Graph..

[2]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[3]  Shunyu Yao,et al.  3D-Aware Scene Manipulation via Inverse Graphics , 2018, NeurIPS.

[4]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  William T. Freeman,et al.  Unsupervised Training for 3D Morphable Model Regression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  David Meger,et al.  Improved Adversarial Systems for 3D Object Generation and Reconstruction , 2017, CoRL.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[10]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Frank D. Luna,et al.  Introduction to 3D Game Programming with DirectX 11 , 2008 .

[12]  Tom Davis,et al.  Opengl programming guide: the official guide to learning opengl , 1993 .

[13]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Attila Szabó,et al.  Unsupervised 3D Shape Learning from Image Collections in the Wild , 2018, ArXiv.

[15]  Jiajun Wu,et al.  Learning Shape Priors for Single-View 3D Completion and Reconstruction , 2018, ECCV.

[16]  Alec Jacobson,et al.  Paparazzi , 2018, ACM Trans. Graph..

[17]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[18]  David Meger,et al.  GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects , 2019, ICML.

[19]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[20]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[21]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[22]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[23]  Jitendra Malik,et al.  Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[24]  Bo Yang,et al.  3D Object Reconstruction from a Single Depth View with Adversarial Learning , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[25]  Daniel Cohen-Or,et al.  Pix2Vex: Image-to-Geometry Reconstruction using a Smooth Differentiable Renderer , 2019, ArXiv.

[26]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[27]  Alexey Dosovitskiy,et al.  Unsupervised Learning of Shape and Pose with Differentiable Point Clouds , 2018, NeurIPS.

[28]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[29]  Gavin S. P. Miller,et al.  Hierarchical Z-buffer visibility , 1993, SIGGRAPH.

[30]  Chun-Liang Li,et al.  Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer , 2018, ICLR.

[31]  Thomas Brox,et al.  What Do Single-View 3D Reconstruction Networks Learn? , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Hao Li,et al.  Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Vittorio Ferrari,et al.  Learning to Generate and Reconstruct 3D Meshes with only 2D Supervision , 2018, BMVC.

[36]  Pat Hanrahan,et al.  An efficient representation for irradiance environment maps , 2001, SIGGRAPH.

[37]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[38]  Hao Li,et al.  Soft Rasterizer: Differentiable Rendering for Unsupervised Single-View Mesh Reconstruction , 2019, ArXiv.

[39]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[40]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[42]  Bui Tuong Phong Illumination for computer generated pictures , 1975, Commun. ACM.

[43]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[44]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.