Potential-Based Advice for Stochastic Policy Learning *
暂无分享,去创建一个
Radha Poovendran | Linda Bushnell | Hannaneh Hajishirzi | Baicen Xiao | Andrew Clark | Bhaskar Ramasubramanian
[1] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[2] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.
[3] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[4] Sam Devlin,et al. Potential-based reward shaping for finite horizon online POMDP planning , 2015, Autonomous Agents and Multi-Agent Systems.
[5] Peter Stone,et al. Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.
[6] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.
[7] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[8] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[9] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[10] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[11] Tamer Basar,et al. A Finite Sample Analysis of the Actor-Critic Algorithm , 2018, 2018 IEEE Conference on Decision and Control (CDC).
[12] Garrison W. Cottrell,et al. Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Sam Devlin,et al. Expressing Arbitrary Reward Functions as Potential-Based Advice , 2015, AAAI.
[15] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[18] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.
[19] Marek Grzes,et al. Reward Shaping in Episodic Reinforcement Learning , 2017, AAMAS.
[20] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[21] Ofir Marom,et al. Belief Reward Shaping in Reinforcement Learning , 2018, AAAI.
[22] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[23] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[24] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[25] Sam Devlin,et al. Dynamic potential-based reward shaping , 2012, AAMAS.
[26] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[27] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[28] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[29] Lucian Busoniu,et al. Reinforcement learning for control: Performance, stability, and deep approximators , 2018, Annu. Rev. Control..
[30] Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..
[31] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[32] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[33] Mao Li,et al. Introspective Reinforcement Learning and Learning from Demonstration , 2018, AAMAS.
[34] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).