Scientific Discovery and the Cost of Measurement - Balancing Information and Cost in Reinforcement Learning

The use of reinforcement learning (RL) in scientific applications, such as materials design and automated chemistry, is increasing. A major challenge, however, lies in fact that measuring the state of the system is often costly and time consuming in scientific applications, whereas policy learning with RL requires a measurement after each time step. In this work, we make the measurement costs explicit in the form of a costed reward and propose a framework that enables offthe-shelf deep RL algorithms to learn a policy for both selecting actions and determining whether or not to measure the current state of the system at each time step. In this way, the agents learn to balance the need for information with the cost of information. Our results show that when trained under this regime, the Dueling DQN and PPO agents can learn optimal action policies whilst making up to 50% fewer state measurements, and recurrent neural networks can produce a greater than 50% reduction in measurements. We postulate the these reduction can help to lower the barrier to applying RL to real-world scientific applications.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Michèle Sebag,et al.  APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.

[3]  Isaac Tamblyn,et al.  Finding the ground state of spin Hamiltonians with reinforcement learning , 2020, Nature Machine Intelligence.

[4]  Leroy Cronin,et al.  Organic synthesis in a modular robotic system driven by a chemical programming language , 2019, Science.

[5]  Dong Yan,et al.  Tianshou: a Highly Modularized Deep Reinforcement Learning Library , 2021, ArXiv.

[6]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[7]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[8]  John Salvatier,et al.  Active Reinforcement Learning: Observing Rewards at a Cost , 2020, ArXiv.

[9]  Rory Coles,et al.  Active Measure Reinforcement Learning for Observation Cost Minimization , 2020, Canadian Conference on AI.

[10]  David M. Bossens,et al.  Learning to learn with active adaptive perception , 2019, Neural Networks.

[11]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12]  Owain Evans,et al.  Active Reinforcement Learning with Monte-Carlo Tree Search , 2018, ArXiv.

[13]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[14]  David Hsu,et al.  Planning under Uncertainty for Robotic Tasks with Mixed Observability , 2010, Int. J. Robotics Res..

[15]  Reiner Sebastian Sprick,et al.  A mobile robotic chemist , 2020, Nature.