Multi-agent Reinforcement Learning Accelerated MCMC on Multiscale Inversion Problem

In this work, we propose a multi-agent actor-critic reinforcement learning (RL) algorithm to accelerate the multi-level Monte Carlo Markov Chain (MCMC) sampling algorithms. The policies (actors) of the agents are used to generate the proposal in the MCMC steps; and the critic, which is centralized, is in charge of estimating the long term reward. We verify our proposed algorithm by solving an inverse problem with multiple scales. There are several difficulties in the implementation of this problem by using traditional MCMC sampling. Firstly, the computation of the posterior distribution involves evaluating the forward solver, which is very time consuming for a problem with heterogeneous. We hence propose to use the multi-level algorithm. More precisely, we use the generalized multiscale finite element method (GMsFEM) as the forward solver in evaluating a posterior distribution in the multi-level rejection procedure. Secondly, it is hard to find a function which can generate samplings which are meaningful. To solve this issue, we learn an RL policy as the proposal generator. Our experiments show that the proposed method significantly improves the sampling process

[1]  Yalchin Efendiev,et al.  Generalized Multiscale Inversion for Heterogeneous Problems , 2017, Communications in Computational Physics.

[2]  Jasper A. Vrugt,et al.  Improving Simulation Efficiency of MCMC for Inverse Modeling of Hydrologic Systems With a Kalman‐Inspired Proposal Distribution , 2017, Water Resources Research.

[3]  Yalchin Efendiev,et al.  Computational multiscale methods for quasi-gas dynamic equations , 2020, J. Comput. Phys..

[4]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[5]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[6]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[7]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[8]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[9]  Yalchin Efendiev,et al.  Bayesian Uncertainty Quantification for Subsurface Inversion Using a Multiscale Hierarchical Model , 2014, Technometrics.

[10]  Anthony Brockwell Parallel Markov chain Monte Carlo Simulation by Pre-Fetching , 2006 .

[11]  Chao Yang,et al.  Learn From Thy Neighbor: Parallel-Chain and Regional Adaptive MCMC , 2009 .

[12]  Jasper A. Vrugt,et al.  High‐dimensional posterior exploration of hydrologic models using multiple‐try DREAM(ZS) and high‐performance computing , 2012 .

[13]  Yalchin Efendiev,et al.  Constraint energy minimizing generalized multiscale finite element method for dual continuum model , 2018, Communications in Mathematical Sciences.

[14]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[15]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[16]  Ben Calderhead,et al.  A general construction for parallelizing Metropolis−Hastings algorithms , 2014, Proceedings of the National Academy of Sciences.

[17]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[18]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[19]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[20]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[21]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[22]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[23]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[24]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[25]  Yalchin Efendiev,et al.  Preconditioning Markov Chain Monte Carlo Simulations Using Coarse-Scale Models , 2006, SIAM J. Sci. Comput..

[26]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[27]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[28]  Walter R. Gilks,et al.  Adaptive Direction Sampling , 1994 .

[29]  Yalchin Efendiev,et al.  Reduced-order deep learning for flow dynamics. The interplay between deep learning and model reduction , 2020, J. Comput. Phys..

[30]  Ying Nian Wu,et al.  Multi-Agent Tensor Fusion for Contextual Trajectory Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[32]  Yalchin Efendiev,et al.  Adaptive multiscale model reduction with Generalized Multiscale Finite Element Methods , 2016, J. Comput. Phys..

[33]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[34]  Yalchin Efendiev,et al.  Learning Algorithms for Coarsening Uncertainty Space and Applications to Multiscale Simulations , 2020, Mathematics.

[35]  J. Rosenthal,et al.  Optimal scaling of discrete approximations to Langevin diffusions , 1998 .

[36]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[37]  Yalchin Efendiev,et al.  Adaptive multiscale MCMC algorithm for uncertainty quantification in seismic parameter estimation , 2014 .

[38]  Yalchin Efendiev,et al.  Deep Multiscale Model Learning , 2018, J. Comput. Phys..

[39]  D. Mallants,et al.  Efficient posterior exploration of a high‐dimensional groundwater model from two‐stage Markov chain Monte Carlo simulation and polynomial chaos expansion , 2013 .

[40]  Yalchin Efendiev,et al.  Non-local multi-continua upscaling for flows in heterogeneous fractured media , 2017, J. Comput. Phys..

[41]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[42]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[43]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[44]  Fei Sha,et al.  Actor-Attention-Critic for Multi-Agent Reinforcement Learning , 2018, ICML.

[45]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.