Inference Compilation and Universal Probabilistic Programming

We introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods. We call what we do "compilation of inference" because our method transforms a denotational specification of an inference problem in the form of a probabilistic program written in a universal programming language into a trained neural network denoted in a neural network specification language. When at test time this neural network is fed observational data and executed, it performs approximate inference in the original model specified by the probabilistic program. Our training objective and learning procedure are designed to allow the trained neural network to be used as a proposal distribution in a sequential importance sampling inference engine. We illustrate our method on mixture models and Captcha solving and show significant speedups in the efficiency of inference.

[1]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Jian Cheng,et al.  AIS-BN: An Adaptive Importance Sampling Algorithm for Evidential Reasoning in Large Bayesian Networks , 2000, J. Artif. Intell. Res..

[4]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[5]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[6]  Horst Bunke,et al.  Generation of synthetic training data for an HMM-based handwriting recognition system , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[7]  Neil A. Dodgson,et al.  Proceedings Ninth IEEE International Conference on Computer Vision , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  John Langford,et al.  CAPTCHA: Using Hard AI Problems for Security , 2003, EUROCRYPT.

[9]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[10]  A. Doucet,et al.  A Tutorial on Particle Filtering and Smoothing: Fifteen years later , 2008 .

[11]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[12]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[13]  C. Shalizi Dynamics of Bayesian Updating with Dependent Data and Misspecified Models , 2009, 0901.1342.

[14]  Andrew Zisserman,et al.  Learning To Count Objects in Images , 2010, NIPS.

[15]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[16]  Noah D. Goodman,et al.  Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation , 2011, AISTATS.

[17]  John C. Mitchell,et al.  Text-based CAPTCHA strengths and weaknesses , 2011, CCS '11.

[18]  Ohad Shamir,et al.  Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.

[19]  Lutz Prechelt,et al.  Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.

[20]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[21]  T. Minka,et al.  Detecting Parameter Symmetries in Probabilistic Models , 2013, 1312.5386.

[22]  Joshua B. Tenenbaum,et al.  Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs , 2013, NIPS.

[23]  Wei Wang,et al.  The robustness of hollow CAPTCHAs , 2013, CCS.

[24]  Cosma Rohilla Shalizi,et al.  Philosophy and the practice of Bayesian statistics. , 2010, The British journal of mathematical and statistical psychology.

[25]  Alexes Butler,et al.  Microsoft Research Cambridge , 2013 .

[26]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[27]  Frank D. Wood,et al.  A New Approach to Probabilistic Programming Inference , 2014, AISTATS.

[28]  Thomas A. Henzinger,et al.  Probabilistic programming , 2014, FOSE.

[29]  Wei Wang,et al.  The Robustness of "Connecting Characters Together" CAPTCHAs , 2014, J. Inf. Sci. Eng..

[30]  Noah D. Goodman,et al.  Amortized Inference in Probabilistic Reasoning , 2014, CogSci.

[31]  Yura N. Perov,et al.  Venture: a higher-order probabilistic programming platform with programmable inference , 2014, ArXiv.

[32]  Yaroslav Bulatov,et al.  Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[33]  Andrew Zisserman,et al.  Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[34]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[35]  John C. Mitchell,et al.  The End is Nigh: Generic Solving of Text-based CAPTCHAs , 2014, WOOT.

[36]  Yee Whye Teh,et al.  Asynchronous Anytime Sequential Monte Carlo , 2014, NIPS.

[37]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[38]  Xi Zhang,et al.  Learning Classifiers from Synthetic Data Using a Multichannel Autoencoder , 2015, ArXiv.

[39]  Oleg Starostenko,et al.  Breaking text-based CAPTCHAs with variable word and character orientation , 2015, Pattern Recognit..

[40]  Xi Zhang,et al.  Learning from Synthetic Data Using a Stacked Multichannel Autoencoder , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[41]  Richard E. Turner,et al.  Neural Adaptive Sequential Monte Carlo , 2015, NIPS.

[42]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Daniel Cremers,et al.  CAPTCHA Recognition with Active Deep Learning , 2015 .

[45]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[47]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[48]  Joshua B. Tenenbaum,et al.  Picture: A probabilistic programming language for scene perception , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Pat Hanrahan,et al.  Neurally-Guided Procedural Models: Amortized Inference for Procedural Graphics Programs using Neural Networks , 2016, NIPS.

[50]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[51]  Noah D. Goodman,et al.  C3: Lightweight Incrementalized MCMC for Probabilistic Programs using Continuations and Callsite Caching , 2015, AISTATS.

[52]  Ryan P. Adams,et al.  Structured VAEs: Composing Probabilistic Graphical Models and Variational Autoencoders , 2016 .

[53]  Fredrik Lindsten,et al.  Interacting Particle Markov Chain Monte Carlo , 2016, ICML.

[54]  Pat Hanrahan,et al.  Neurally-Guided Procedural Models: Learning to Guide Procedural Models with Deep Neural Networks , 2016, ArXiv.

[55]  Frank D. Wood,et al.  Inference Networks for Sequential Monte Carlo in Graphical Models , 2016, ICML.

[56]  Subhashini Venugopalan,et al.  Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.

[57]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.