Unsupervised Learning of 3D Structure from Images

A key goal of computer vision is to recover the underlying 3D structure from 2D observations of the world. In this paper we learn strong deep generative models of 3D structures, and recover these structures from 3D and 2D images via probabilistic inference. We demonstrate high-quality samples and report log-likelihoods on several datasets, including ShapeNet [2], and establish the first benchmarks in the literature. We also show how these models and their inference networks can be trained end-to-end from 2D images. This demonstrates for the first time the feasibility of learning to infer 3D representations of the world in a purely unsupervised manner.

[1]  Corporate OpenGL Architecture ReviewBoard OpenGL reference manual: the official reference document for OpenGL, release 1 , 1992 .

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Dave Shreiner OpenGL Reference Manual: The Official Reference Document to OpenGL, Version 1.2 , 1999 .

[4]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[5]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[6]  Szymon Rusinkiewicz,et al.  Modeling by example , 2004, SIGGRAPH 2004.

[7]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[8]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[9]  R. Sundareswara,et al.  Perceptual multistability predicted by search model for Bayesian decisions. , 2008, Journal of vision.

[10]  Geoffrey E. Hinton,et al.  3D Object Recognition with Deep Belief Nets , 2009, NIPS.

[11]  Noah D. Goodman,et al.  Nonstandard Interpretations of Probabilistic Programs for Efficient Inference , 2011, NIPS.

[12]  Leonidas J. Guibas,et al.  Probabilistic reasoning for assembly-based 3D modeling , 2011, ACM Trans. Graph..

[13]  Siddhartha Chaudhuri,et al.  A probabilistic model for component-based shape synthesis , 2012, ACM Trans. Graph..

[14]  Daniel Fried,et al.  Bayesian geometric modeling of indoor scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[16]  Joshua B. Tenenbaum,et al.  Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs , 2013, NIPS.

[17]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[18]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[19]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2013, ICML.

[20]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[21]  Tejas D. Kulkarni,et al.  Deep Generative Vision as Approximate Bayesian Computation , 2014 .

[22]  Joshua B. Tenenbaum,et al.  Inverse Graphics with Probabilistic CAD Models , 2014, ArXiv.

[23]  Max Welling,et al.  Transformation Properties of Learned Visual Representations , 2014, ICLR.

[24]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[26]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[28]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Joshua B. Tenenbaum,et al.  Picture: A probabilistic programming language for scene perception , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[33]  Daan Wierstra,et al.  One-Shot Generalization in Deep Generative Models , 2016, ICML.

[34]  Andriy Mnih,et al.  Variational Inference for Monte Carlo Objectives , 2016, ICML.

[35]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[36]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[37]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[38]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[39]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[40]  Daan Wierstra,et al.  Towards Conceptual Compression , 2016, NIPS.