Composing graphical models with neural networks for structured representations and fast inference

We propose a general modeling and inference framework that composes probabilistic graphical models with deep learning methods and combines their respective strengths. Our model family augments graphical structure in latent variables with neural network observation models. For inference, we extend variational autoencoders to use graphical model approximating distributions with recognition networks that output conjugate potentials. All components of these models are learned simultaneously with a single objective, giving a scalable algorithm that leverages stochastic variational inference, natural gradients, graphical model message passing, and the reparameterization trick. We illustrate this framework with several example models and an application to mouse behavioral phenotyping.

[1]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[2]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[3]  David J.C. Mackay,et al.  Density networks , 2000 .

[4]  Bhaskar D. Rao,et al.  Variational EM Algorithms for Non-Gaussian Latent Variable Models , 2005, NIPS.

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  James Martens,et al.  New perspectives on the natural gradient method , 2014, ArXiv.

[7]  Chia-ying Lee,et al.  Discovering linguistic structures in speech: models and applications , 2014 .

[8]  Tom Minka,et al.  Non-conjugate Variational Message Passing for Multinomial and Binary Regression , 2011, NIPS.

[9]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[10]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[11]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[12]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[13]  Anthony V. Fiacco,et al.  Introduction to Sensitivity and Stability Analysis in Nonlinear Programming , 2012 .

[14]  Matthew J. Johnson,et al.  Stochastic Variational Inference for Bayesian Time Series Models , 2014, ICML.

[15]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[16]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[17]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[18]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[19]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[20]  Pascal Fua,et al.  Kullback-Leibler Proximal Variational Inference , 2015, NIPS.

[21]  Mark W. Schmidt,et al.  Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions , 2015, UAI.

[22]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[23]  Francis R. Bach,et al.  Sparse probabilistic projections , 2008, NIPS.

[24]  Li Deng,et al.  Computational Models for Speech Production , 2018, Speech Processing.

[25]  Li Deng,et al.  Switching Dynamic System Models for Speech Articulation and Acoustics , 2004 .

[26]  Tomoharu Iwata,et al.  Warped Mixtures for Nonparametric Cluster Shapes , 2012, UAI.

[27]  Benjamin Pfaff,et al.  Perturbation Analysis Of Optimization Problems , 2016 .

[28]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[29]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[30]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[31]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[32]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[33]  Uri Shalit,et al.  Deep Kalman Filters , 2015, ArXiv.

[34]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[35]  J. Danskin The Theory of Max-Min and its Application to Weapons Allocation Problems , 1967 .

[36]  Il Memming Park,et al.  BLACK BOX VARIATIONAL INFERENCE FOR STATE SPACE MODELS , 2015, 1511.07367.

[37]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[38]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[39]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[40]  Michael I. Jordan,et al.  Bayesian Nonparametric Inference of Switching Dynamic Linear Models , 2010, IEEE Transactions on Signal Processing.

[41]  Ryan P. Adams,et al.  Mapping Sub-Second Structure in Mouse Behavior , 2015, Neuron.

[42]  D. M. Titterington,et al.  Statistics and Neural Networks , 2000, Technometrics.