Distributed Adaptive Importance Sampling on graphical models using MapReduce

In the case of a graphical model, machine learning algorithms used to evaluate a query can be broadly classified into exact and approximate inference algorithms. Exact inference algorithms use only network parameters to evaluate a query. However, these algorithms are typically intractable on large networks due to exponential time and space complexity. Approximate inference algorithms are widely used in practice to overcome this constraint, with a trade-off in accuracy. It includes sampling and propagation-based algorithms. These approximate algorithms may also suffer from scalability issues if applied on large networks, for achieving higher accuracy. To address this challenge, we have designed and implemented several MapReduce-based distributed versions of a specific type of approximate inference algorithm called Adaptive Importance Sampling (AIS). We compare and evaluate the proposed approaches using benchmark networks. Experimental results show that our proposed approaches achieve significant scaleup and speedup compared to the sequential method, while achieving similar accuracy asymptotically.

[1]  Rina Dechter,et al.  An Empirical Study of w-Cutset Sampling for Bayesian Networks , 2003, UAI.

[2]  G. Casella,et al.  Rao-Blackwellisation of sampling schemes , 1996 .

[3]  Jérôme Morio,et al.  Non-parametric adaptive importance sampling for the probability estimation of a launcher impact position , 2011, Reliab. Eng. Syst. Saf..

[4]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Mats Gyllenberg,et al.  Bayesian model learning based on a parallel MCMC strategy , 2006, Stat. Comput..

[6]  Michael Luby,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks is NP-Hard , 1993, Artif. Intell..

[7]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[8]  Chong Wang,et al.  Asymptotically Exact, Embarrassingly Parallel MCMC , 2013, UAI.

[9]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[10]  David R. O'Hallaron,et al.  Distributed Parallel Inference on Large Factor Graphs , 2009, UAI.

[11]  Michael I. Jordan Graphical Models , 2003 .

[12]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[13]  Viktor K. Prasanna,et al.  Junction tree decomposition for parallel exact inference , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[14]  Arthur Gretton,et al.  Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees , 2011, AISTATS.

[15]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[16]  Jian Cheng,et al.  AIS-BN: An Adaptive Importance Sampling Algorithm for Evidential Reasoning in Large Bayesian Networks , 2000, J. Artif. Intell. Res..

[17]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[18]  Vibhav Gogate,et al.  Approximate Inference Algorithms for Hybrid Bayesian Networks with Discrete Constraints , 2005, UAI.

[19]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[20]  Latifur Khan,et al.  MapReduce guided approximate inference over graphical models , 2014, 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[21]  Rajan Srinivasan,et al.  Adaptive importance sampling for performance evaluation and parameter optimization of communication systems , 2000, IEEE Trans. Commun..

[22]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[23]  William H. Hsu,et al.  A Survey of Algorithms for Real-Time Bayesian Network Inference , 2002 .

[24]  Lennart F. Hoogerheide,et al.  Bayesian Forecasting of Value at Risk and Expected Shortfall Using Adaptive Importance Sampling , 2008 .