Composable probabilistic inference with BLAISE

If we are to understand human-level cognition, we must understand how the mind finds the patterns that underlie the incomplete, noisy, and ambiguous data from our senses and that allow us to generalize our experiences to new situations. A wide variety of commercial applications face similar issues: industries from health services to business intelligence to oil field exploration critically depend on their ability to find patterns in vast amounts of data and use those patterns to make accurate predictions. Probabilistic inference provides a unified, systematic framework for specifying and solving these problems. Recent work has demonstrated the great value of probabilistic models defined over complex, structured domains. However, our ability to imagine probabilistic models has far outstripped our ability to programmatically manipulate them and to effectively implement inference, limiting the complexity of the problems that we can solve in practice. This thesis presents BLAISE, a novel framework for composable probabilistic modeling and inference, designed to address these limitations. BLAISE has three components: (a) The BLAISE State-Density-Kernel (SDK) graphical modeling language that generalizes factor graphs by: (1) explicitly representing inference algorithms (and their locality) using a new type of graph node, (2) representing hierarchical composition and repeated substructures in the state space, the interest distribution, and the inference procedure, and (3) permitting the structure of the model to change during algorithm execution. (b) A suite of SDK graph transformations that may be used to extend a model (e.g. to construct a mixture model from a model of a mixture component), or to make inference more effective (e.g. by automatically constructing a parallel tempered version of an algorithm or by exploiting conjugacy in a model). (c) The BLAISE Virtual Machine, a runtime environment that can efficiently execute the stochastic automata represented by BLAISE SDK graphs. BLAISE encourages the construction of sophisticated models by composing simpler models, allowing the designer to implement and verify small portions of the model and inference method, and to reuse model components from one task to another. BLAISE decouples the implementation of the inference algorithm from the specification of the interest distribution, even in cases (such as Gibbs sampling) where the shape of the interest distribution guides the inference. This gives modelers the freedom to explore alternate models without slow, error-prone reimplementation. The compositional nature of BLAISE enables novel reinterpretations of advanced Monte Carlo inference techniques (such as parallel tempering) as simple transformations of BLAISE SDK graphs. In this thesis, I describe each of the components of the BLAISE modeling framework, as well as validating the BLAISE framework by highlighting a variety of contemporary sophisticated models that have been developed by the BLAISE user community. I also present several surprising findings stemming from the BLAISE modeling framework, including that an Infinite Relational Model can be built using exactly the same inference methods as a simple mixture model, that constructing a parallel tempered inference algorithm should be a point-and-click/one-line-of-code operation, and that Markov chain Monte Carlo for probabilistic models with complicated long-distance dependencies, such as a stochastic version of Scheme, can be managed using standard BLAISE mechanisms. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[2]  Maneesh Sahani,et al.  Evidence Optimization Techniques for Estimating Stimulus-Response Functions , 2002, NIPS.

[3]  Wei Ji Ma,et al.  Bayesian inference with probabilistic population codes , 2006, Nature Neuroscience.

[4]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[5]  Nando de Freitas,et al.  Nonparametric Bayesian Logic , 2005, UAI.

[6]  G. Rota The Number of Partitions of a Set , 1964 .

[7]  A. Yuille,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Vision as Bayesian inference: analysis by synthesis? , 2022 .

[8]  Stephen L. Burbeck,et al.  Applications programming in smalltalk-80: how to use model-view-controller (mvc) , 1987 .

[9]  W. C. Dash Growth of Silicon Crystals Free from Dislocations , 1959 .

[10]  Avi Pfeffer,et al.  IBAL: A Probabilistic Rational Programming Language , 2001, IJCAI.

[11]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[12]  Michael W Deem,et al.  Parallel tempering: theory, applications, and new perspectives. , 2005, Physical chemistry chemical physics : PCCP.

[13]  Richard E. Neapolitan,et al.  Probabilistic Methods for Financial and Marketing Informatics , 2007 .

[14]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[15]  W. Gilks,et al.  Following a moving target—Monte Carlo inference for dynamic Bayesian models , 2001 .

[16]  Alexa T. McCray,et al.  An Upper-Level Ontology for the Biomedical Domain , 2003, Comparative and functional genomics.

[17]  Robert Eckstein,et al.  Java Swing, Second Edition , 2002 .

[18]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[19]  Vikash K. Mansinghka Church : a language for generative models with non-parametric memoization and approximate inference , 2008 .

[20]  David J. Lunn,et al.  Generic reversible jump MCMC using graphical models , 2009, Stat. Comput..

[21]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[22]  Joshua B. Tenenbaum,et al.  Theory-Based Induction , 2003 .

[23]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[24]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[25]  M. Ross Quillian,et al.  Retrieval time from semantic memory , 1969 .

[26]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[27]  Joshua B. Tenenbaum,et al.  Learning annotated hierarchies from relational data , 2006, NIPS.

[28]  Eero P. Simoncelli,et al.  Spatiotemporal Elements of Macaque V1 Receptive Fields , 2005, Neuron.

[29]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[30]  Stuart J. Russell,et al.  Approximate Inference for Infinite Contingent Bayesian Networks , 2005, AISTATS.

[31]  Daniel Zelterman,et al.  Bayesian Artificial Intelligence , 2005, Technometrics.

[32]  Harry Shum,et al.  Image segmentation by data driven Markov chain Monte Carlo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[33]  J. Tenenbaum,et al.  Word learning as Bayesian inference. , 2007, Psychological review.

[34]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[35]  J. Tenenbaum,et al.  Theory-based Bayesian models of inductive learning and reasoning , 2006, Trends in Cognitive Sciences.

[36]  Stuart J. Russell,et al.  Probabilistic models with unknown objects , 2006 .

[37]  R. Zemel,et al.  Inference and computation with population codes. , 2003, Annual review of neuroscience.

[38]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[39]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[40]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Stephen Travis Pope,et al.  A Description of the Model-View-Controller User Interface Paradigm in the Smalltalk-80 System , 1998 .

[42]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[43]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[45]  Christopher G. Lasater,et al.  Design Patterns , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[46]  Shimon Ullman Sequence Seeking and Counter Streams: A Model for Information Flow in the Visual Cortex , 1996 .

[47]  John M. Winn,et al.  Variational Message Passing and its Applications , 2004 .

[48]  Mriganka Sur,et al.  Hierarchical Bayesian modeling and Markov chain Monte Carlo sampling for tuning-curve analysis. , 2010, Journal of neurophysiology.

[49]  J. Tenenbaum,et al.  Special issue on “Probabilistic models of cognition , 2022 .

[50]  Stephen J. Roberts,et al.  Probabilistic Modeling in Bioinformatics and Medical Informatics , 2010 .

[51]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[52]  Cyril S. Ku,et al.  Design Patterns , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[53]  Curt Hibbs,et al.  Ruby on Rails: Up and Running , 2006 .

[54]  Klaus Obermayer,et al.  Dynamics of Orientation Tuning in Cat V1 Neurons Depend on Location Within Layers and Orientation Maps , 2007, Front. Neurosci..

[55]  J. Tenenbaum,et al.  Two proposals for causal grammars , 2007 .

[56]  Gerald J. Sussman,et al.  Structure and interpretation of computer programs , 1985, Proceedings of the IEEE.

[57]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.