暂无分享,去创建一个
[1] H. Touchette. The large deviation approach to statistical mechanics , 2008, 0804.0327.
[2] Vivien Lecomte,et al. A numerical approach to large deviations in continuous time , 2007 .
[3] Pankaj Mehta,et al. Reinforcement Learning in Different Phases of Quantum Control , 2017, Physical Review X.
[4] J. P. Garrahan,et al. Phases of quantum dimers from ensembles of classical stochastic trajectories , 2018, Physical review B.
[6] Feng Chen,et al. Extreme spin squeezing from deep reinforcement learning , 2019, Physical Review A.
[7] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[8] Lin Lin,et al. Policy Gradient based Quantum Approximate Optimization Algorithm , 2020, MSML.
[9] Wulfram Gerstner,et al. Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons , 2013, PLoS Comput. Biol..
[10] Mandayam A. L. Thathachar,et al. Local and Global Optimization Algorithms for Generalized Learning Automata , 1995, Neural Computation.
[11] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[12] M. Littman,et al. Mean Actor Critic , 2017, ArXiv.
[13] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[14] Robert L. Jack,et al. Effective interactions and large deviations in stochastic processes , 2015, The European Physical Journal Special Topics.
[15] Avishek Das,et al. Variational control forces for enhanced sampling of nonequilibrium molecular dynamics simulations. , 2019, The Journal of chemical physics.
[16] Richard S. Sutton,et al. Multi-step Reinforcement Learning: A Unifying Algorithm , 2017, AAAI.
[17] Vivek S. Borkar,et al. Q-Learning for Risk-Sensitive Control , 2002, Math. Oper. Res..
[18] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[20] Matteo Hessel,et al. General non-linear Bellman equations , 2019, ArXiv.
[21] E. Solano,et al. Reinforcement learning for semi-autonomous approximate quantum eigensolver , 2019, Mach. Learn. Sci. Technol..
[22] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[23] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[24] Chris Beeler,et al. Optimizing thermodynamic trajectories using evolutionary reinforcement learning , 2019, ArXiv.
[25] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[26] Stephen Whitelam,et al. Evolutionary reinforcement learning of dynamical large deviations , 2019, The Journal of chemical physics.
[27] Marin Bukov,et al. Reinforcement learning for autonomous preparation of Floquet-engineered states: Inverting the quantum Kapitza oscillator , 2018, Physical Review B.
[28] Florian Marquardt,et al. Reinforcement Learning with Neural Networks for Quantum Feedback , 2018, Physical Review X.
[29] R. Jack. Ergodicity and large deviations in physical systems with stochastic dynamics , 2019, The European Physical Journal B.
[30] Vivek S. Borkar,et al. Peformance Analysis Conditioned on Rare Events: An Adaptive Simulation Scheme , 2003, Commun. Inf. Syst..
[31] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.
[32] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[33] Vivek S. Borkar,et al. Adaptive Importance Sampling Technique for Markov Chains Using Stochastic Approximation , 2006, Oper. Res..
[34] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[35] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[36] S. Whitelam,et al. Direct evaluation of dynamical large-deviation rate functions using a variational ansatz. , 2019, Physical review. E.
[37] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[38] J. P. Garrahan,et al. A Tensor Network Approach to Finite Markov Decision Processes , 2020, ArXiv.
[39] Richard S. Sutton,et al. Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning , 2017, AAMAS.
[40] Vivek S. Borkar,et al. A Learning Algorithm for Risk-Sensitive Cost , 2008, Math. Oper. Res..
[41] Sham M. Kakade,et al. Optimizing Average Reward Using Discounted Rewards , 2001, COLT/EuroCOLT.
[42] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[43] Garnet Kin-Lic Chan,et al. Exact Fluctuations of Nonequilibrium Steady States from Approximate Auxiliary Dynamics. , 2017, Physical review letters.
[44] Harm van Seijen,et al. Effective Multi-step Temporal-Difference Learning for Non-Linear Function Approximation , 2016, ArXiv.
[45] P. Dupuis,et al. Splitting for rare event simulation : A large deviation approach to design and analysis , 2007, 0711.2037.
[46] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[47] S. Majumdar,et al. Effective Langevin equations for constrained stochastic processes , 2015, 1503.02639.
[48] F. Cérou,et al. Adaptive Multilevel Splitting for Rare Event Analysis , 2007 .
[49] R. Jack,et al. Absence of dissipation in trajectory ensembles biased by currents , 2016, 1602.03815.
[50] J. P. Garrahan,et al. Using Matrix Product States to Study the Dynamical Large Deviations of Kinetically Constrained Models. , 2019, Physical review letters.
[51] Ryan P. Adams,et al. A Theoretical Connection Between Statistical Physics and Reinforcement Learning , 2019, ArXiv.
[52] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[53] Vivien Lecomte,et al. Simulating Rare Events in Dynamical Processes , 2011, 1106.4929.
[54] Tsuyoshi Murata,et al. {m , 1934, ACML.
[55] M. Yor,et al. Penalising Brownian Paths , 2009 .
[56] Shimon Whiteson,et al. Expected Policy Gradients for Reinforcement Learning , 2018, J. Mach. Learn. Res..
[57] B. Derrida,et al. Large deviations conditioned on large deviations I: Markov chain and Langevin equation , 2018, Journal of Statistical Physics.
[58] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[59] Richard S. Sutton,et al. Two geometric input transformation methods for fast online reinforcement learning with neural nets , 2018, ArXiv.
[60] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[61] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[62] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[63] Hilbert J. Kappen,et al. Adaptive Importance Sampling for Control and Inference , 2015, ArXiv.
[64] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[65] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[66] J. Hooyberghs,et al. Density-matrix renormalization-group study of current and activity fluctuations near nonequilibrium phase transitions. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.
[67] Martha White,et al. An Off-policy Policy Gradient Theorem Using Emphatic Weightings , 2018, NeurIPS.
[68] Sina Ghiassian,et al. Overcoming Catastrophic Interference in Online Reinforcement Learning with Dynamic Self-Organizing Maps , 2019, ArXiv.
[69] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[70] Peter Sollich,et al. Large deviations and ensembles of trajectories in stochastic models , 2009, 0911.0211.
[71] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[72] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[73] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[74] Stefano Soatto,et al. Toward Understanding Catastrophic Forgetting in Continual Learning , 2019, ArXiv.
[75] Jorge Kurchan,et al. Direct evaluation of large-deviation functions. , 2005, Physical review letters.
[76] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[77] V. Borkar. Learning Algorithms for Risk-Sensitive Control , 2010 .
[78] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[79] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[80] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[81] Takahiro Nemoto,et al. Computation of large deviation statistics via iterative measurement-and-feedback procedure. , 2013, Physical review letters.
[82] Freddy Bouchet,et al. Population-dynamics method with a multicanonical feedback control. , 2016, Physical review. E.
[83] Pawel Cichosz,et al. Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning , 1994, J. Artif. Intell. Res..
[84] Richard S. Sutton,et al. Comparing Policy-Gradient Algorithms , 2001 .
[85] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[86] Hugo Touchette,et al. Variational and optimal control representations of conditioned and driven processes , 2015, 1506.05291.
[87] Garnet Kin-Lic Chan,et al. Constructing auxiliary dynamics for nonequilibrium stationary states by variance minimization. , 2019, The Journal of chemical physics.
[88] J. P. Garrahan,et al. Rare behavior of growth processes via umbrella sampling of trajectories. , 2017, Physical review. E.
[89] Peter L. Bartlett,et al. Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning , 2000, J. Comput. Syst. Sci..
[90] J. P. Garrahan. Classical stochastic dynamics and continuous matrix product states: gauge transformations, conditioned and driven processes, and equivalence of trajectory ensembles , 2016, 1602.07966.
[91] David T. Limmer,et al. Importance sampling large deviations in nonequilibrium steady states. I. , 2017, The Journal of chemical physics.
[92] Austen Lamacraft,et al. Quantum Ground States from Reinforcement Learning , 2020, MSML.
[93] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..
[94] J. P. Garrahan,et al. A deep learning functional estimator of optimal dynamics for sampling large deviations , 2020, Mach. Learn. Sci. Technol..
[95] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[96] R Ratcliff,et al. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.
[97] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[98] Christopher Amato,et al. Efficient Eligibility Traces for Deep Reinforcement Learning , 2018, ArXiv.
[99] Richard S. Sutton,et al. Discounted Reinforcement Learning is Not an Optimization Problem , 2019, ArXiv.
[100] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[101] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[102] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[103] Hamid Reza Maei,et al. Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation , 2018, ArXiv.
[104] G. Chan,et al. Dynamical phase behavior of the single- and multi-lane asymmetric simple exclusion process via matrix product states. , 2019, Physical review. E.
[105] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[106] M. Cates,et al. Optimizing active work: Dynamical phase transitions, collective motion, and jamming. , 2018, Physical review. E.
[107] David Chandler,et al. Transition path sampling: throwing ropes over rough mountain passes, in the dark. , 2002, Annual review of physical chemistry.
[108] Troels Arnfred Bojesen,et al. Policy-guided Monte Carlo: Reinforcement-learning Markov chain dynamics , 2018, Physical Review E.
[109] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[110] H. Touchette,et al. Adaptive Sampling of Large Deviations , 2018, Journal of Statistical Physics.
[111] Frank L. Lewis,et al. Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.
[112] J. P. Garrahan. Aspects of non-equilibrium in classical and quantum systems: Slow relaxation and glasses, dynamical large deviations, quantum non-ergodicity, and open quantum dynamics , 2017, Physica A: Statistical Mechanics and its Applications.
[113] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[114] Travis B Dick,et al. Policy Gradient Reinforcement Learning Without Regret , 2015 .
[115] Gerald Tesauro,et al. Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.
[116] John N. Tsitsiklis,et al. Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes , 2003, Discret. Event Dyn. Syst..
[117] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[118] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[119] J. P. Garrahan,et al. Dynamic Order-Disorder in Atomistic Models of Structural Glass Formers , 2009, Science.
[120] Rosalind J. Allen,et al. Malliavin Weight Sampling: A Practical Guide , 2013, Entropy.
[121] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[122] Jakub Dolezal,et al. Large deviations and optimal control forces for hard particles in one dimension , 2019, Journal of Statistical Mechanics: Theory and Experiment.
[123] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[124] Emanuel Todorov,et al. Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.
[125] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[126] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[127] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[128] William T. Faricy. A.A.R. , 1951 .
[129] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[130] R. Jack,et al. Finite-Size Scaling of a First-Order Dynamical Phase Transition: Adaptive Population Dynamics and an Effective Model. , 2016, Physical review letters.
[131] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.
[132] Markus Heyl,et al. Reinforcement Learning for Digital Quantum Simulation. , 2020, Physical review letters.