Inverse Graphics with Probabilistic CAD Models

Recently, multiple formulations of vision problems as probabilistic inversions of generative models based on computer graphics have been proposed. However, applications to 3D perception from natural images have focused on low-dimensional latent scenes, due to challenges in both modeling and inference. Accounting for the enormous variability in 3D object shape and 2D appearance via realistic generative models seems intractable, as does inverting even simple versions of the many-to-many computations that link 3D scenes to 2D images. This paper proposes and evaluates an approach that addresses key aspects of both these challenges. We show that it is possible to solve challenging, real-world 3D vision problems by approximate inference in generative models for images based on rendering the outputs of probabilistic CAD (PCAD) programs. Our PCAD object geometry priors generate deformable 3D meshes corresponding to plausible objects and apply affine transformations to place them in a scene. Image likelihoods are based on similarity in a feature space based on standard mid-level image representations from the vision literature. Our inference algorithm integrates single-site and locally blocked Metropolis-Hastings proposals, Hamiltonian Monte Carlo and discriminative data-driven proposals learned from training data generated from our models. We apply this approach to 3D human pose estimation and object shape reconstruction from single images, achieving quantitative and qualitative performance improvements over state-of-the-art baselines.

[1]  Yura N. Perov,et al.  Venture: a higher-order probabilistic programming platform with programmable inference , 2014, ArXiv.

[2]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[3]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Bruce G. Baumgart,et al.  Geometric modeling for computer vision. , 1974 .

[5]  George Papandreou,et al.  Modeling Image Patches with a Generic Dictionary of Mini-epitomes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  O. François,et al.  Approximate Bayesian Computation (ABC) in practice. , 2010, Trends in ecology & evolution.

[7]  Joshua B. Tenenbaum,et al.  Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs , 2013, NIPS.

[8]  Harry Shum,et al.  Image segmentation by data driven Markov chain Monte Carlo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[9]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[10]  Sebastian Nowozin,et al.  The informed sampler: A discriminative approach to Bayesian inference in generative computer vision models , 2014, Comput. Vis. Image Underst..

[11]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[12]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Cordelia Schmid,et al.  Estimating Human Pose with Flowing Puppets , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  A. Yuille,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Vision as Bayesian inference: analysis by synthesis? , 2022 .

[15]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[16]  Eero P. Simoncelli,et al.  A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients , 2000, International Journal of Computer Vision.

[17]  Jianxiong Xiao,et al.  Localizing 3D cuboids in single-view images , 2012, NIPS.

[18]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[19]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.