Evolutionary reinforcement learning of dynamical large deviations

We show how to bound and calculate the likelihood of dynamical large deviations using evolutionary reinforcement learning. An agent, a stochastic model, propagates a continuous-time Monte Carlo trajectory and receives a reward conditioned upon the values of certain path-extensive quantities. Evolution produces progressively fitter agents, potentially allowing the calculation of a piece of a large-deviation rate function for a particular model and path-extensive quantity. For models with small state spaces, the evolutionary process acts directly on rates, and for models with large state spaces, the process acts on the weights of a neural network that parameterizes the model's rates. This approach shows how path-extensive physics problems can be considered within a framework widely used in machine learning.

[1]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[2]  S. Varadhan,et al.  Large deviations , 2019, Graduate Studies in Mathematics.

[3]  Troels Arnfred Bojesen,et al.  Policy-guided Monte Carlo: Reinforcement-learning Markov chain dynamics , 2018, Physical Review E.

[4]  Donald L. Iglehart,et al.  Importance sampling for stochastic simulations , 1989 .

[5]  Roger G. Melko,et al.  Machine learning phases of matter , 2016, Nature Physics.

[6]  D. Gillespie Exact Stochastic Simulation of Coupled Chemical Reactions , 1977 .

[7]  Jorge Kurchan,et al.  Direct evaluation of large-deviation functions. , 2005, Physical review letters.

[8]  George L. Nemhauser,et al.  Handbooks in operations research and management science , 1989 .

[9]  P. Shahabuddin,et al.  Chapter 11 Rare-Event Simulation Techniques: An Introduction and Recent Advances , 2006, Simulation.

[10]  Kyle Mills,et al.  Deep learning and the Schrödinger equation , 2017, ArXiv.

[11]  Gerardo Rubino,et al.  Introduction to Rare Event Simulation , 2009, Rare Event Simulation using Monte Carlo Methods.

[12]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[13]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[14]  T. Morawietz,et al.  Density anomaly of water at negative pressures from first principles , 2018, Journal of physics. Condensed matter : an Institute of Physics journal.

[15]  Hilbert J. Kappen,et al.  Adaptive Importance Sampling for Control and Inference , 2015, ArXiv.

[16]  Christoph Becker,et al.  Identifying quantum phase transitions using artificial neural networks on experimental data , 2018, Nature Physics.

[17]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  W. Marsden I and J , 2012 .

[20]  Gerbrand Ceder,et al.  Constructing first-principles phase diagrams of amorphous LixSi using machine-learning-assisted sampling with an evolutionary algorithm. , 2018, The Journal of chemical physics.

[21]  J. P. Garrahan Simple bounds on fluctuations and uncertainty relations for first-passage times of counting observables. , 2017, Physical review. E.

[22]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[23]  Udo Seifert,et al.  Universal bounds on current fluctuations. , 2015, Physical review. E.

[24]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[25]  A. Faggionato,et al.  Large deviations of the empirical flow for continuous time Markov chains , 2012, 1210.2004.

[26]  R. Jack,et al.  Finite-Size Scaling of a First-Order Dynamical Phase Transition: Adaptive Population Dynamics and an Effective Model. , 2016, Physical review letters.

[27]  D. Weitz,et al.  Activity-driven fluctuations in living cells , 2015, 1505.06489.

[28]  P. Ney,et al.  Monte Carlo simulation and large deviations theory for uniformly recurrent Markov chains , 1990, Journal of Applied Probability.

[29]  Kurt Binder,et al.  Introduction: Theory and “Technical” Aspects of Monte Carlo Simulations , 1986 .

[30]  L. C. Stayton,et al.  On the effectiveness of crossover in simulated evolutionary optimization. , 1994, Bio Systems.

[31]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[32]  Vivek S. Borkar,et al.  Peformance Analysis Conditioned on Rare Events: An Adaptive Simulation Scheme , 2003, Commun. Inf. Syst..

[33]  G. Parmigiani Large Deviation Techniques in Decision, Simulation and Estimation , 1992 .

[34]  S. Whitelam,et al.  Direct evaluation of dynamical large-deviation rate functions using a variational ansatz. , 2019, Physical review. E.

[35]  Vivek S. Borkar,et al.  A Learning Algorithm for Risk-Sensitive Cost , 2008, Math. Oper. Res..

[36]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[37]  A. Engel,et al.  The large deviation function for entropy production: the optimal trajectory and the role of fluctuations , 2012, 1210.3042.

[38]  V. Lecomte,et al.  Finite Size Scaling of the Dynamical Free-Energy in a Kinetically Constrained Model , 2011, 1111.6394.

[39]  Johannes Hachmann,et al.  Machine learning and data science in materials design: a themed collection , 2018 .

[40]  Hugo Touchette,et al.  Variational and optimal control representations of conditioned and driven processes , 2015, 1506.05291.

[41]  Yanjie Li,et al.  A basic formula for performance gradient estimation of semi-Markov decision processes , 2013, Eur. J. Oper. Res..

[42]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[43]  J. P. Garrahan,et al.  First-order dynamical phase transition in models of glasses: an approach based on ensembles of histories , 2008, 0810.5298.

[44]  J. Delhommelle,et al.  A new approach for the prediction of partition functions using machine learning techniques. , 2018, The Journal of chemical physics.

[45]  Vivek S. Borkar,et al.  Adaptive Importance Sampling Technique for Markov Chains Using Stochastic Approximation , 2006, Oper. Res..

[46]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[47]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[48]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[49]  C. Maes,et al.  Canonical structure of dynamical fluctuations in mesoscopic nonequilibrium steady states , 2007, 0705.2344.

[50]  David Chandler,et al.  Geometrical explanation and scaling of dynamical heterogeneities in glass forming systems. , 2002, Physical review letters.

[51]  V. Lecomte,et al.  Current Fluctuations in Systems with Diffusive Dynamics, in and out of Equilibrium(Frontiers in Nonequilibrium Physics-Fundamental Theory, Glassy & Granular Materials, and Computational Physics-) , 2009, 0911.0564.

[52]  Garnet Kin-Lic Chan,et al.  Exact Fluctuations of Nonequilibrium Steady States from Approximate Auxiliary Dynamics. , 2017, Physical review letters.

[53]  Arnaud Doucet,et al.  A policy gradient method for semi-Markov decision processes with application to call admission control , 2007, Eur. J. Oper. Res..

[54]  Christoph Dellago,et al.  Library-Based LAMMPS Implementation of High-Dimensional Neural Network Potentials. , 2019, Journal of chemical theory and computation.

[55]  J.S. Sadowsky,et al.  On large deviations theory and asymptotically efficient Monte Carlo estimation , 1990, IEEE Trans. Inf. Theory.

[56]  Andrew L. Ferguson,et al.  Machine learning and molecular design of self-assembling -conjugated oligopeptides , 2018 .

[57]  M. Mézard,et al.  Journal of Statistical Mechanics: Theory and Experiment , 2011 .

[58]  Peter W. Glynn,et al.  Stochastic Simulation: Algorithms and Analysis , 2007 .

[59]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[60]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[61]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[62]  J. P. Garrahan,et al.  A deep learning functional estimator of optimal dynamics for sampling large deviations , 2020, Mach. Learn. Sci. Technol..

[63]  Todd R. Gingrich,et al.  Dissipation Bounds All Steady-State Current Fluctuations. , 2015, Physical review letters.

[64]  David W Toth,et al.  The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics , 2017, Chemical science.

[65]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[66]  David Chandler,et al.  Transition path sampling: throwing ropes over rough mountain passes, in the dark. , 2002, Annual review of physical chemistry.

[67]  H. Touchette The large deviation approach to statistical mechanics , 2008, 0804.0327.

[68]  Udo Seifert Entropy production along a stochastic trajectory and an integral fluctuation theorem. , 2005, Physical review letters.

[69]  Isaac Tamblyn,et al.  Sampling algorithms for validation of supervised learning models for Ising-like systems , 2017, J. Comput. Phys..

[70]  Stephen Whitelam,et al.  Learning to grow: control of materials self-assembly using evolutionary reinforcement learning , 2019, Physical review. E.

[71]  Glenn H. Fredrickson,et al.  Kinetic Ising model of the glass transition , 1984 .