Learning to Do or Learning While Doing: Reinforcement Learning and Bayesian Optimisation for Online Continuous Tuning

Online tuning of real-world plants is a complex optimisation problem that continues to require manual intervention by experienced human operators. Autonomous tuning is a rapidly expanding field of research, where learning-based methods, such as Reinforcement Learning-trained Optimisation (RLO) and Bayesian optimisation (BO), hold great promise for achieving outstanding plant performance and reducing tuning times. Which algorithm to choose in different scenarios, however, remains an open question. Here we present a comparative study using a routine task in a real particle accelerator as an example, showing that RLO generally outperforms BO, but is not always the best choice. Based on the study's results, we provide a clear set of criteria to guide the choice of algorithm for a given tuning task. These can ease the adoption of learning-based autonomous tuning solutions to the operation of complex real-world plants, ultimately improving the availability and pushing the limits of operability of these facilities, thereby enabling scientific and engineering advancements.

[1]  M. Schuh,et al.  Bayesian optimization of the beam injection process into a storage ring , 2022, Physical Review Accelerators and Beams.

[2]  Y. Na,et al.  Development of an operation trajectory design algorithm for control of multiple 0D parameters using deep reinforcement learning in KSTAR , 2022, Nuclear Fusion.

[3]  C. Vérinaud,et al.  Toward on-sky adaptive optics control using reinforcement learning. Model-based policy optimization for adaptive optics , 2022, Astronomy & Astrophysics.

[4]  Martin A. Riedmiller,et al.  Magnetic control of tokamak plasmas through deep reinforcement learning , 2022, Nature.

[5]  P. Stone,et al.  Real-world challenges for multi-agent reinforcement learning in grid-interactive buildings , 2021, Energy and AI.

[6]  Yang-wang Fang,et al.  Accelerated Deep Reinforcement Learning for Fast Feedback of Beam Dynamics at KARA , 2021, IEEE Transactions on Nuclear Science.

[7]  Annika Eichler,et al.  First Steps Toward an Autonomous Accelerator, a Common Project Between DESY and KIT , 2021 .

[8]  R. Assmann,et al.  Commissioning Results and Electron Beam Characterization with the S-Band Photoinjector at SINBAD-ARES , 2021, Instruments.

[9]  Liang Guo,et al.  Application of Deep Reinforcement Learning to Thermal Control of Space Telescope , 2021 .

[10]  Jay I. Myung,et al.  Toward autonomous additive manufacturing: Bayesian optimization on a 3D printer , 2021, MRS Bulletin.

[11]  W. Yin,et al.  Learning to Optimize: A Primer and A Benchmark , 2021, J. Mach. Learn. Res..

[12]  R. Lehe,et al.  Bayesian Optimization of a Laser-Plasma Accelerator. , 2021, Physical review letters.

[13]  Sarod Yatawatta,et al.  Deep reinforcement learning for smart calibration of radio telescopes , 2021, Monthly Notices of the Royal Astronomical Society.

[14]  F. O’Shea,et al.  Policy gradient methods for free-electron laser and terahertz source optimization and stabilization at the FERMI free-electron laser at Elettra , 2020, Physical Review Accelerators and Beams.

[15]  Jakob Hoydis,et al.  Bayesian Optimization for Radio Resource Management: Open Loop Power Control , 2020, IEEE Journal on Selected Areas in Communications.

[16]  Gianluca Valentino,et al.  Sample-efficient reinforcement learning for CERN accelerator control , 2020, Physical Review Accelerators and Beams.

[17]  Alberto E. Cerpa,et al.  MB2C: Model-Based Deep Reinforcement Learning for Multi-zone Building Control , 2020, BuildSys@SenSys.

[18]  Malachi Schram,et al.  Real-time artificial intelligence for accelerator control: A study at the Fermilab Booster , 2020, Physical Review Accelerators and Beams.

[19]  Sunil Thulasidasan,et al.  Autonomous Control of a Particle Accelerator using Deep Reinforcement Learning , 2020, ArXiv.

[20]  N. Bourgeois,et al.  Automation and control of laser wakefield accelerators using Bayesian optimization , 2020, Nature Communications.

[21]  Felice Andrea Pellegrino,et al.  Basic Reinforcement Learning Techniques to Control the Intensity of a Seeded Free-Electron Laser , 2020, Electronics.

[22]  José Manuel Rodríguez-Ramos,et al.  Towards Piston Fine Tuning of Segmented Mirrors through Reinforcement Learning , 2020, Applied Sciences.

[23]  J. Shtalenkova,et al.  Online tuning and light source control using a physics-informed Gaussian process Adi , 2019, ArXiv.

[24]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[25]  S. Ermon,et al.  Bayesian Optimization of a Free-Electron Laser. , 2019, Physical review letters.

[26]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[27]  Gabriel Dulac-Arnold,et al.  Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[28]  James P. Sethna,et al.  Online storage ring optimization using dimension-reduction and genetic algorithms , 2018, Physical Review Accelerators and Beams.

[29]  Stephen J. Roberts,et al.  Bayesian Optimization for Dynamic Problems , 2018, 1803.03432.

[30]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[31]  Richard N. Zare,et al.  Optimizing Chemical Reactions with Deep Reinforcement Learning , 2017, ACS central science.

[32]  Alireza Bafandeh,et al.  Real-time control using Bayesian optimization: A case study in airborne wind energy systems , 2017 .

[33]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[34]  Jitendra Malik,et al.  Learning to Optimize Neural Nets , 2017, ArXiv.

[35]  Gianluca Geloni,et al.  Progress in Automatic Software-based Optimization of Accelerator Performance , 2016 .

[36]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[37]  Andreas Krause,et al.  Bayesian optimization for maximum power point tracking in photovoltaic power plants , 2016, 2016 European Control Conference (ECC).

[38]  Stefano Ermon,et al.  Bayesian Optimization of FEL Performance at LCLS , 2016 .

[39]  Lawrence J. Rybarcyk,et al.  Multi-objective particle swarm and genetic algorithm for the optimization of the LANSCE linac operation , 2014 .

[40]  J. Safranek,et al.  MACHINE BASED OPTIMIZATION USING GENETIC ALGORITHMS IN A STORAGE RING , 2014 .

[41]  Juhao Wu,et al.  An algorithm for online optimization of accelerators , 2013 .

[42]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[43]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[44]  O. Stein,et al.  Learning-based Optimisation of Particle Accelerators Under Partial Observability Without Real-World Training , 2022, ICML.

[45]  Y. Na,et al.  Feedforward beta control in the KSTAR tokamak by deep reinforcement learning , 2021 .

[46]  Yong Huang,et al.  Intelligent Thermal Control Strategy Based on Reinforcement Learning for Space Telescope , 2020 .

[47]  Daniel R. Jiang,et al.  BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization , 2020, NeurIPS.

[48]  F. Lutscher Spatial Variation , 2019, Interdisciplinary Applied Mathematics.

[49]  Tamim Asfour,et al.  Feedback Design for Control of the Micro-Bunching Instability based on Reinforcement Learning , 2019 .

[50]  D. Olsson Online Optimisation of the MAX IV 3 GeV Ring Dynamic Aperture , 2018 .

[51]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[52]  G. Evans,et al.  Learning to Optimize , 2008 .

[53]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..