Picture: A probabilistic programming language for scene perception

Recent progress on probabilistic modeling and statistical learning, coupled with the availability of large training datasets, has led to remarkable progress in computer vision. Generative probabilistic models, or “analysis-by-synthesis” approaches, can capture rich scene structure but have been less widely applied than their discriminative counterparts, as they often require considerable problem-specific engineering in modeling and inference, and inference is typically seen as requiring slow, hypothesize-and-test Monte Carlo methods. Here we present Picture, a probabilistic programming language for scene understanding that allows researchers to express complex generative vision models, while automatically solving them using fast general-purpose inference machinery. Picture provides a stochastic scene language that can express generative models for arbitrary 2D/3D scenes, as well as a hierarchy of representation layers for comparing scene hypotheses with observed images by matching not simply pixels, but also more abstract features (e.g., contours, deep neural network activations). Inference can flexibly integrate advanced Monte Carlo strategies with fast bottom-up data-driven methods. Thus both representations and inference strategies can build directly on progress in discriminatively trained systems to make generative vision more robust and efficient. We use Picture to write programs for 3D face analysis, 3D human pose estimation, and 3D object reconstruction - each competitive with specially engineered baselines.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Ulf Grenander,et al.  Hands: A Pattern Theoretic Study of Biological Shapes , 1990 .

[3]  Ulf Grenander,et al.  General Pattern Theory: A Mathematical Study of Regular Structures , 1993 .

[4]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[5]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[6]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[7]  G. Stiny Shape , 1999 .

[8]  Zhuowen Tu,et al.  Image Segmentation by Data-Driven Markov Chain Monte Carlo , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Björn Stenger,et al.  Shape context and chamfer matching in cluttered scenes , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[11]  B. Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[12]  Stuart Geman,et al.  Context and Hierarchy in a Probabilistic Image Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[14]  Sangyop Lee Boundary structure of hyperbolic 3-manifolds admitting annular fillings at large distance , 2006 .

[15]  Mun Wai Lee,et al.  A model-based approach for estimating human 3D poses in static images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[17]  Martin D. Levine,et al.  State-of-the-art of 3D facial reconstruction methods for face recognition based on a single 2D training image per person , 2009, Pattern Recognit. Lett..

[18]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Radford M. Neal Regression and Classification Using Gaussian Process Priors , 2009 .

[20]  Sami Romdhani,et al.  A 3D Face Model for Pose and Illumination Invariant Face Recognition , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[21]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[22]  A. P. Dawid,et al.  Regression and Classification Using Gaussian Process Priors , 2009 .

[23]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Ryan P. Adams,et al.  Elliptical slice sampling , 2009, AISTATS.

[25]  David A. Forsyth,et al.  Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry , 2010, ECCV.

[26]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[27]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[28]  Song-Chun Zhu,et al.  Image Parsing via Stochastic Scene Grammar , 2011 .

[29]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[30]  Ira Kemelmacher-Shlizerman,et al.  Face Reconstruction from a Single Image using a Single Reference Face Shape , 2009 .

[31]  Jeffrey S. Rosenthal,et al.  Optimal Proposal Distributions and Adaptive MCMC , 2011 .

[32]  Noah D. Goodman,et al.  Nonstandard Interpretations of Probabilistic Programs for Efficient Inference , 2011, NIPS.

[33]  Noah D. Goodman,et al.  Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation , 2011, AISTATS.

[34]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[35]  Song-Chun Zhu,et al.  Image Parsing with Stochastic Scene Grammar , 2011, NIPS.

[36]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  Kobus Barnard,et al.  Understanding Bayesian Rooms Using Composite 3D Object Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Ghassan Hamarneh,et al.  Bilateral Maps for Partial Matching , 2013, Comput. Graph. Forum.

[40]  R. Wilkinson Approximate Bayesian computation (ABC) gives exact results under the assumption of model error , 2008, Statistical applications in genetics and molecular biology.

[41]  Joshua B. Tenenbaum,et al.  Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs , 2013, NIPS.

[42]  Oswald Aldrian,et al.  Inverse Rendering of Faces with a 3D Morphable Model , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Thomas Vetter,et al.  Human face shape analysis under spherical harmonics illumination considering self occlusion , 2013, 2013 International Conference on Biometrics (ICB).

[44]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[45]  Noah D. Goodman,et al.  Learning Stochastic Inverses , 2013, NIPS.

[46]  N. Mitra,et al.  Meta-representation of shape families , 2014, ACM Trans. Graph..

[47]  Frank D. Wood,et al.  A Compilation Target for Probabilistic Programming Languages , 2014, ICML.

[48]  Frank D. Wood,et al.  A New Approach to Probabilistic Programming Inference , 2014, AISTATS.

[49]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[50]  Yinda Zhang,et al.  PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding , 2014, ECCV.

[51]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[52]  Yura N. Perov,et al.  Venture: a higher-order probabilistic programming platform with programmable inference , 2014, ArXiv.

[53]  Tijmen Tieleman,et al.  Optimizing Neural Networks that Generate Iimages , 2014 .

[54]  Niloy J. Mitra,et al.  ShapeSynth: Parameterizing model collections for coupled shape exploration and synthesis , 2014, Comput. Graph. Forum.

[55]  Ilker Yildirim Efficient and robust analysis-by-synthesis in vision : A computational framework , behavioral tests , and modeling neuronal representations , 2015 .

[56]  Joshua B. Tenenbaum,et al.  Efficient analysis-by-synthesis in vision: A computational framework, behavioral tests, and modeling neuronal representations , 2015, Annual Meeting of the Cognitive Science Society.

[57]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[58]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Sebastian Nowozin,et al.  The informed sampler: A discriminative approach to Bayesian inference in generative computer vision models , 2014, Comput. Vis. Image Underst..

[60]  Ardavan Saeedi,et al.  Variational Particle Approximations , 2014, J. Mach. Learn. Res..