Principled reward shaping for reinforcement learning via lyapunov stability theory
暂无分享,去创建一个
[1] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[2] Sam Devlin,et al. Potential-based difference rewards for multiagent reinforcement learning , 2014, AAMAS.
[3] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[4] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[5] Wei Zhou,et al. Data driven discovery of cyber physical systems , 2018, Nature Communications.
[6] Sam Devlin,et al. Dynamic potential-based reward shaping , 2012, AAMAS.
[7] Garrison W. Cottrell,et al. Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.
[8] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[9] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[10] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[11] Claire J. Tomlin,et al. On the Powerball Method: Variants of Descent Methods for Accelerated Optimization , 2016, IEEE Control Systems Letters.
[12] Kagan Tumer,et al. Combining reward shaping and hierarchies for scaling to large multiagent systems , 2016, The Knowledge Engineering Review.
[13] Michael L. Littman,et al. Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.
[14] Marek Grzes,et al. Reward Shaping in Episodic Reinforcement Learning , 2017, AAMAS.
[15] R. Bellman. Dynamic programming. , 1957, Science.
[16] Zhongke Shi,et al. Reinforcement Learning Output Feedback NN Control Using Deterministic Learning Technique , 2014, IEEE Transactions on Neural Networks and Learning Systems.
[17] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[18] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[19] Jun Liu,et al. On the Powerball Method for Optimization , 2016 .
[20] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[21] Toshiharu Sugawara,et al. Coordinated behavior of cooperative agents using deep reinforcement learning , 2020, Neurocomputing.
[22] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[23] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[24] Andrew G. Barto,et al. Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..
[25] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[26] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[27] Yurong Liu,et al. A survey of deep neural network architectures and their applications , 2017, Neurocomputing.
[28] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[29] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[30] Sam Devlin,et al. Policy invariance under reward transformations for multi-objective reinforcement learning , 2017, Neurocomputing.
[31] Sam Devlin,et al. Potential-based reward shaping for POMDPs , 2013, AAMAS.
[32] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[33] A. M. Lyapunov. The general problem of the stability of motion , 1992 .
[34] Bilal H. Abed-alguni,et al. Double Delayed Q-learning , 2018 .