论文信息 - Evolutionary reinforcement learning of dynamical large deviations - 字舞流文

Evolutionary reinforcement learning of dynamical large deviations

We show how to bound and calculate the likelihood of dynamical large deviations using evolutionary reinforcement learning. An agent, a stochastic model, propagates a continuous-time Monte Carlo trajectory and receives a reward conditioned upon the values of certain path-extensive quantities. Evolution produces progressively fitter agents, potentially allowing the calculation of a piece of a large-deviation rate function for a particular model and path-extensive quantity. For models with small state spaces, the evolutionary process acts directly on rates, and for models with large state spaces, the process acts on the weights of a neural network that parameterizes the model's rates. This approach shows how path-extensive physics problems can be considered within a framework widely used in machine learning.

Stephen Whitelam | Isaac Tamblyn | Daniel Jacobson | S. Whitelam | Daniel Jacobson | Isaac Tamblyn

[1] Alexandre Tkatchenko,et al. Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[2] S. Varadhan,et al. Large deviations , 2019, Graduate Studies in Mathematics.

[3] Troels Arnfred Bojesen,et al. Policy-guided Monte Carlo: Reinforcement-learning Markov chain dynamics , 2018, Physical Review E.

[4] Donald L. Iglehart,et al. Importance sampling for stochastic simulations , 1989 .

[5] Roger G. Melko,et al. Machine learning phases of matter , 2016, Nature Physics.

[6] D. Gillespie. Exact Stochastic Simulation of Coupled Chemical Reactions , 1977 .

[7] Jorge Kurchan,et al. Direct evaluation of large-deviation functions. , 2005, Physical review letters.

[8] George L. Nemhauser,et al. Handbooks in operations research and management science , 1989 .

[9] P. Shahabuddin,et al. Chapter 11 Rare-Event Simulation Techniques: An Introduction and Recent Advances , 2006, Simulation.

[10] Kyle Mills,et al. Deep learning and the Schrödinger equation , 2017, ArXiv.

[11] Gerardo Rubino,et al. Introduction to Rare Event Simulation , 2009, Rare Event Simulation using Monte Carlo Methods.

[12] Michele Parrinello,et al. Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[13] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[14] T. Morawietz,et al. Density anomaly of water at negative pressures from first principles , 2018, Journal of physics. Condensed matter : an Institute of Physics journal.

[15] Hilbert J. Kappen,et al. Adaptive Importance Sampling for Control and Inference , 2015, ArXiv.

[16] Christoph Becker,et al. Identifying quantum phase transitions using artificial neural networks on experimental data , 2018, Nature Physics.

[17] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[18] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19] W. Marsden. I and J , 2012 .

[20] Gerbrand Ceder,et al. Constructing first-principles phase diagrams of amorphous LixSi using machine-learning-assisted sampling with an evolutionary algorithm. , 2018, The Journal of chemical physics.

[21] J. P. Garrahan. Simple bounds on fluctuations and uncertainty relations for first-passage times of counting observables. , 2017, Physical review. E.

[22] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[23] Udo Seifert,et al. Universal bounds on current fluctuations. , 2015, Physical review. E.

[24] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[25] A. Faggionato,et al. Large deviations of the empirical flow for continuous time Markov chains , 2012, 1210.2004.

[26] R. Jack,et al. Finite-Size Scaling of a First-Order Dynamical Phase Transition: Adaptive Population Dynamics and an Effective Model. , 2016, Physical review letters.

[27] D. Weitz,et al. Activity-driven fluctuations in living cells , 2015, 1505.06489.

[28] P. Ney,et al. Monte Carlo simulation and large deviations theory for uniformly recurrent Markov chains , 1990, Journal of Applied Probability.

[29] Kurt Binder,et al. Introduction: Theory and “Technical” Aspects of Monte Carlo Simulations , 1986 .

[30] L. C. Stayton,et al. On the effectiveness of crossover in simulated evolutionary optimization. , 1994, Bio Systems.

[31] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[32] Vivek S. Borkar,et al. Peformance Analysis Conditioned on Rare Events: An Adaptive Simulation Scheme , 2003, Commun. Inf. Syst..

[33] G. Parmigiani. Large Deviation Techniques in Decision, Simulation and Estimation , 1992 .

[34] S. Whitelam,et al. Direct evaluation of dynamical large-deviation rate functions using a variational ansatz. , 2019, Physical review. E.

[35] Vivek S. Borkar,et al. A Learning Algorithm for Risk-Sensitive Cost , 2008, Math. Oper. Res..

[36] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[37] A. Engel,et al. The large deviation function for entropy production: the optimal trajectory and the role of fluctuations , 2012, 1210.3042.

[38] V. Lecomte,et al. Finite Size Scaling of the Dynamical Free-Energy in a Kinetically Constrained Model , 2011, 1111.6394.

[39] Johannes Hachmann,et al. Machine learning and data science in materials design: a themed collection , 2018 .

[40] Hugo Touchette,et al. Variational and optimal control representations of conditioned and driven processes , 2015, 1506.05291.

[41] Yanjie Li,et al. A basic formula for performance gradient estimation of semi-Markov decision processes , 2013, Eur. J. Oper. Res..

[42] K-R Müller,et al. SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[43] J. P. Garrahan,et al. First-order dynamical phase transition in models of glasses: an approach based on ensembles of histories , 2008, 0810.5298.

[44] J. Delhommelle,et al. A new approach for the prediction of partition functions using machine learning techniques. , 2018, The Journal of chemical physics.

[45] Vivek S. Borkar,et al. Adaptive Importance Sampling Technique for Markov Chains Using Stochastic Approximation , 2006, Oper. Res..

[46] Martin A. Riedmiller,et al. Reinforcement learning for robot soccer , 2009, Auton. Robots.

[47] Danna Zhou,et al. d. , 1934, Microbial pathogenesis.

[48] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[49] C. Maes,et al. Canonical structure of dynamical fluctuations in mesoscopic nonequilibrium steady states , 2007, 0705.2344.

[50] David Chandler,et al. Geometrical explanation and scaling of dynamical heterogeneities in glass forming systems. , 2002, Physical review letters.

[51] V. Lecomte,et al. Current Fluctuations in Systems with Diffusive Dynamics, in and out of Equilibrium(Frontiers in Nonequilibrium Physics-Fundamental Theory, Glassy & Granular Materials, and Computational Physics-) , 2009, 0911.0564.

[52] Garnet Kin-Lic Chan,et al. Exact Fluctuations of Nonequilibrium Steady States from Approximate Auxiliary Dynamics. , 2017, Physical review letters.

[53] Arnaud Doucet,et al. A policy gradient method for semi-Markov decision processes with application to call admission control , 2007, Eur. J. Oper. Res..

[54] Christoph Dellago,et al. Library-Based LAMMPS Implementation of High-Dimensional Neural Network Potentials. , 2019, Journal of chemical theory and computation.

[55] J.S. Sadowsky,et al. On large deviations theory and asymptotically efficient Monte Carlo estimation , 1990, IEEE Trans. Inf. Theory.

[56] Andrew L. Ferguson,et al. Machine learning and molecular design of self-assembling -conjugated oligopeptides , 2018 .

[57] M. Mézard,et al. Journal of Statistical Mechanics: Theory and Experiment , 2011 .

[58] Peter W. Glynn,et al. Stochastic Simulation: Algorithms and Analysis , 2007 .

[59] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[60] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[61] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[62] J. P. Garrahan,et al. A deep learning functional estimator of optimal dynamics for sampling large deviations , 2020, Mach. Learn. Sci. Technol..

[63] Todd R. Gingrich,et al. Dissipation Bounds All Steady-State Current Fluctuations. , 2015, Physical review letters.

[64] David W Toth,et al. The TensorMol-0.1 model chemistry: a neural network augmented with long-range physics , 2017, Chemical science.

[65] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[66] David Chandler,et al. Transition path sampling: throwing ropes over rough mountain passes, in the dark. , 2002, Annual review of physical chemistry.

[67] H. Touchette. The large deviation approach to statistical mechanics , 2008, 0804.0327.

[68] Udo Seifert. Entropy production along a stochastic trajectory and an integral fluctuation theorem. , 2005, Physical review letters.

[69] Isaac Tamblyn,et al. Sampling algorithms for validation of supervised learning models for Ising-like systems , 2017, J. Comput. Phys..

[70] Stephen Whitelam,et al. Learning to grow: control of materials self-assembly using evolutionary reinforcement learning , 2019, Physical review. E.

[71] Glenn H. Fredrickson,et al. Kinetic Ising model of the glass transition , 1984 .