Scalable importance sampling estimation of Gaussian mixture posteriors in Bayesian networks

Abstract In this paper we propose a scalable importance sampling algorithm for computing Gaussian mixture posteriors in conditional linear Gaussian Bayesian networks. Our contribution is based on using a stochastic gradient ascent procedure taking as input a stream of importance sampling weights, so that a mixture of Gaussians is dynamically updated with no need to store the full sample. The algorithm has been designed following a Map/Reduce approach and is therefore scalable with respect to computing resources. The implementation of the proposed algorithm is available as part of the AMIDST open-source toolbox for scalable probabilistic machine learning ( http://www.amidsttoolbox.com ).

[1]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[2]  Kuo-Chu Chang,et al.  Weighing and Integrating Evidence for Stochastic Simulation in Bayesian Networks , 2013, UAI.

[3]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[4]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[5]  C. Robert Kenley,et al.  Gaussian influence diagrams , 1989 .

[6]  Changhe Yuan,et al.  Importance Sampling for General Hybrid Bayesian Networks , 2007, AISTATS.

[7]  H. Robbins A Stochastic Approximation Method , 1951 .

[8]  Anders L. Madsen,et al.  AMIDST: a Java Toolbox for Scalable Probabilistic Machine Learning , 2017, Knowl. Based Syst..

[9]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[10]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  Steffen L. Lauritzen,et al.  Bayesian updating in causal probabilistic networks by local computations , 1990 .

[13]  Andrés R. Masegosa,et al.  Parallel Importance Sampling in Conditional Linear Gaussian Networks , 2015, CAEPIA.

[14]  Anders L. Madsen,et al.  LAZY Propagation: A Junction Tree Inference Algorithm Based on Lazy Evaluation , 1999, Artif. Intell..

[15]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[16]  J. Hammersley,et al.  Monte Carlo Methods , 1965 .

[17]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[18]  Rafael Rumí,et al.  Answering queries in hybrid Bayesian networks using importance sampling , 2012, Decis. Support Syst..

[19]  Norman E. Fenton,et al.  Modeling dependable systems using hybrid Bayesian networks , 2006, First International Conference on Availability, Reliability and Security (ARES'06).

[20]  Andrés R. Masegosa,et al.  Scaling up Bayesian variational inference using distributed computing clusters , 2017, Int. J. Approx. Reason..

[21]  Martin Neil,et al.  Inference in hybrid Bayesian networks using dynamic discretization , 2007, Stat. Comput..

[22]  Steffen L. Lauritzen,et al.  Stable local computation with conditional Gaussian distributions , 2001, Stat. Comput..

[23]  Changhe Yuan,et al.  Importance sampling algorithms for Bayesian networks: Principles and performance , 2006, Math. Comput. Model..

[24]  Andrés R. Masegosa,et al.  MAP inference in dynamic hybrid Bayesian networks , 2017, Progress in Artificial Intelligence.

[25]  Rafael Rumí,et al.  Inference in hybrid Bayesian networks with Mixtures of Truncated Basis Functions , 2012, PGM 2012.

[26]  Rafael Rumí,et al.  Approximate probability propagation with mixtures of truncated exponentials , 2007, Int. J. Approx. Reason..

[27]  Prakash P. Shenoy,et al.  Binary join trees for computing marginals in the Shenoy-Shafer architecture , 1997, Int. J. Approx. Reason..

[28]  Jian Cheng,et al.  AIS-BN: An Adaptive Importance Sampling Algorithm for Evidential Reasoning in Large Bayesian Networks , 2000, J. Artif. Intell. Res..

[29]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[30]  Paulo Martins Engel,et al.  A Fast Incremental Gaussian Mixture Model , 2015, PloS one.

[31]  Léon Bottou,et al.  On-line learning and stochastic approximations , 1999 .

[32]  Prakash P. Shenoy,et al.  Axioms for probability and belief-function proagation , 1990, UAI.

[33]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[34]  Prakash P. Shenoy,et al.  Practical Aspects of Solving Hybrid Bayesian Networks Containing Deterministic Conditionals , 2015, Int. J. Intell. Syst..

[35]  Andrés R. Masegosa,et al.  Stochastic Discriminative EM , 2014, UAI.

[36]  Serafín Moral,et al.  Dynamic importance sampling in Bayesian networks based on probability trees , 2005, Int. J. Approx. Reason..

[37]  Fabio Gagliardi Cozman,et al.  Anytime anyspace probabilistic inference , 2004, Int. J. Approx. Reason..

[38]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[39]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[40]  Nevin Lianwen Zhang,et al.  Exploiting Causal Independence in Bayesian Network Inference , 1996, J. Artif. Intell. Res..

[41]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[42]  Anders L. Madsen,et al.  Improvements to message computation in lazy propagation , 2010, Int. J. Approx. Reason..

[43]  Daphne Koller,et al.  Nonuniform Dynamic Discretization in Hybrid Networks , 1997, UAI.