Deep Generative Vision as Approximate Bayesian Computation

Probabilistic formulations of inverse graphics have recently been proposed for a variety of 2D and 3D vision problems [15, 12, 14, 9]. These approaches represent visual elements in form of graphics simulators that produce approximate renderings of the visual scenes. Existing approaches either model pixel data or hand-crafted intermediate representations such as edge maps, super-pixels, silhouettes etc. However, the choice of features can drastically affect inference quality and run-time. Recently, deep learning techniques such as Convolutional Neural Networks (CNNs) have demonstrated impressive performance on various tasks such as object recognition and scene pixel labeling, suggesting the superiority of CNN-based features. Encouraged by this findings, we test the ability of CNNs in combination with Approximate Bayesian Computation (ABC) to invert high dimensional generative inverse graphics models from single images. We successfully applied a variant of the probabilistic approximate MCMC algorithm [21] which uses CNN to quantify summary statistics on two real world problems: inferring 3D pose of humans and generative face analysis from single images. Computer Graphics seems to be advancing at a great pace in terms of designing solutions for hard image synthesis problems. Our experiments indicate that the combination of rich probabilistic inverse graphics models and deep learning approaches could utilize such simulators directly to solve the hard inversion problem.

[1]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[2]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[3]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  S. Hochstein,et al.  The reverse hierarchy theory of visual perceptual learning , 2004, Trends in Cognitive Sciences.

[5]  B. Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[6]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[7]  A. P. Dawid,et al.  Regression and Classification Using Gaussian Process Priors , 2009 .

[8]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Ryan P. Adams,et al.  Elliptical slice sampling , 2009, AISTATS.

[10]  Song-Chun Zhu,et al.  Image Parsing via Stochastic Scene Grammar , 2011 .

[11]  Noah D. Goodman,et al.  Nonstandard Interpretations of Probabilistic Programs for Efficient Inference , 2011, NIPS.

[12]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[13]  Michael J. Black,et al.  Coregistration: Simultaneous Alignment and Modeling of Articulated 3D Shape , 2012, ECCV.

[14]  Yann LeCun,et al.  Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers , 2012, ICML.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Daniel Fried,et al.  Bayesian geometric modeling of indoor scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  R. Wilkinson Approximate Bayesian computation (ABC) gives exact results under the assumption of model error , 2008, Statistical applications in genetics and molecular biology.

[18]  Joshua B. Tenenbaum,et al.  Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs , 2013, NIPS.

[19]  R. Fergus,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[20]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[21]  Joshua B. Tenenbaum,et al.  Inverse Graphics with Probabilistic CAD Models , 2014, ArXiv.

[22]  Sebastian Nowozin,et al.  The informed sampler: A discriminative approach to Bayesian inference in generative computer vision models , 2014, Comput. Vis. Image Underst..