Cross-Entropy Estimators for Sequential Experiment Design with Reinforcement Learning

Reinforcement learning can effectively learn amortised design policies for designing sequences of experiments. However, current methods rely on contrastive estimators of expected information gain, which require an exponential number of contrastive samples to achieve an unbiased estimation. We propose an alternative lower bound estimator, based on the cross-entropy of the joint model distribution and a flexible proposal distribution. This proposal distribution approximates the true posterior of the model parameters given the experimental history and the design policy. Our estimator requires no contrastive samples, can achieve more accurate estimates of high information gains, allows learning of superior design policies, and is compatible with implicit probabilistic models. We assess our algorithm's performance in various tasks, including continuous and discrete designs and explicit and implicit likelihoods.

[1]  José Miguel Hernández-Lobato,et al.  normflows: A PyTorch Package for Normalizing Flows , 2023, J. Open Source Softw..

[2]  Xiaoyu Chen,et al.  Flow-based Recurrent Belief State Learning for POMDPs , 2022, ICML.

[3]  Chris P. Barnes,et al.  Deep reinforcement learning for optimal experimental design in biology , 2022, bioRxiv.

[4]  Edwin V. Bonilla,et al.  Optimizing Sequential Experimental Design with Deep Reinforcement Learning , 2022, ICML.

[5]  Michael U. Gutmann,et al.  Implicit Deep Adaptive Design: Policy-Based Experimental Design without Likelihoods , 2021, NeurIPS.

[6]  Tom Rainforth,et al.  Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design , 2021, ICML.

[7]  Che Wang,et al.  Randomized Ensembled Double Q-Learning: Learning Fast Without a Model , 2021, ICLR.

[8]  S. Kleinegesse,et al.  Bayesian Experimental Design for Implicit Models by Mutual Information Neural Estimation , 2020, ICML.

[9]  Xubo Yue,et al.  Why Non-myopic Bayesian Optimization is Promising and How Far Should We Look-ahead? A Study via Rollout , 2019, AISTATS.

[10]  Y. Teh,et al.  A Unified Stochastic Gradient Approach to Designing Bayesian-Optimal Experiments , 2019, AISTATS.

[11]  Max Welling,et al.  Learning Likelihoods with Conditional Normalizing Flows , 2019, ArXiv.

[12]  Roman Garnett,et al.  BINOCULARS for efficient, nonmyopic sequential experimental design , 2019, ICML.

[13]  Alexander A. Alemi,et al.  On Variational Bounds of Mutual Information , 2019, ICML.

[14]  Yee Whye Teh,et al.  Variational Bayesian Optimal Experimental Design , 2019, NeurIPS.

[15]  Chandler Squires,et al.  ABCD-Strategy: Budgeted Experimental Design for Targeted Causal Structure Discovery , 2019, AISTATS.

[16]  Matteo Hessel,et al.  Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.

[17]  Noah D. Goodman,et al.  Pyro: Deep Universal Probabilistic Programming , 2018, J. Mach. Learn. Res..

[18]  Wojciech Jaskowski,et al.  Model-Based Active Exploration , 2018, ICML.

[19]  Karl Stratos,et al.  Formal Limitations on the Measurement of Mutual Information , 2018, AISTATS.

[20]  C. Drovandi,et al.  Sequential experimental design for predator–prey functional response experiments , 2018, Journal of the Royal Society Interface.

[21]  Hongseok Yang,et al.  On Nesting Monte Carlo Estimators , 2017, ICML.

[22]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[23]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[24]  Anthony N. Pettitt,et al.  A Review of Modern Computational Algorithms for Bayesian Optimal Design , 2016 .

[25]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[26]  Anthony N. Pettitt,et al.  A Sequential Monte Carlo Algorithm to Incorporate Model Uncertainty in Bayesian Sequential Design , 2014 .

[27]  Brooks Paige,et al.  Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits , 2013, NIPS.

[28]  Zoubin Ghahramani,et al.  Bayesian Active Learning for Classification and Preference Learning , 2011, ArXiv.

[29]  Andreas Krause,et al.  Nonmyopic active learning of Gaussian processes: an exploration-exploitation approach , 2007, ICML '07.

[30]  George Baltas,et al.  Utility-consistent Brand Demand Systems with Endogenous Category Consumption: Principles and Marketing Applications , 2001, Decis. Sci..

[31]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[32]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[33]  Paul Hudak,et al.  Code optimizations for lazy evaluation , 1988, LISP Symb. Comput..

[34]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[35]  Y. Gal,et al.  VariBAD: Variational Bayes-Adaptive Deep RL via Meta-Learning , 2021, J. Mach. Learn. Res..

[36]  Edward R. Dougherty,et al.  Uncertainty-aware Active Learning for Optimal Bayesian Classifier , 2021, ICLR.

[37]  Andrew G. Barto,et al.  Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .