暂无分享,去创建一个
[1] Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..
[2] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[3] Ana Paiva,et al. Learning by appraising: an emotion-based approach to intrinsic reward design , 2014, Adapt. Behav..
[4] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[5] M. Grzes,et al. Plan-based reward shaping for reinforcement learning , 2008, 2008 4th International IEEE Conference Intelligent Systems.
[6] R. Bellman. A Markovian Decision Process , 1957 .
[7] Sam Devlin,et al. Dynamic potential-based reward shaping , 2012, AAMAS.
[8] Sonia Chernova,et al. Reinforcement Learning from Demonstration through Shaping , 2015, IJCAI.
[9] Sonia Chernova,et al. Integrating reinforcement learning with human demonstrations of varying ability , 2011, AAMAS.
[10] Tim Salimans,et al. Learning Montezuma's Revenge from a Single Demonstration , 2018, ArXiv.
[11] Ana Paiva,et al. Emotion-Based Intrinsic Motivation for Reinforcement Learning Agents , 2011, ACII.
[12] Sonia Chernova,et al. Recent Advances in Robot Learning from Demonstration , 2020, Annu. Rev. Control. Robotics Auton. Syst..
[13] Richard L. Lewis,et al. Reward Design via Online Gradient Ascent , 2010, NIPS.
[14] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[15] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.
[16] Sonia Chernova,et al. Learning from Demonstration for Shaping through Inverse Reinforcement Learning , 2016, AAMAS.
[17] Garrison W. Cottrell,et al. Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.
[18] Peter Stone,et al. Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.
[19] Marek Grzes,et al. Reward Shaping in Episodic Reinforcement Learning , 2017, AAMAS.
[20] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[21] Sam Devlin,et al. Plan-based reward shaping for multi-agent reinforcement learning , 2016, The Knowledge Engineering Review.
[22] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[23] G. Baldassarre,et al. Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot , 2007, 2007 IEEE 6th International Conference on Development and Learning.
[24] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[25] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.
[26] Nuttapong Chentanez,et al. Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .
[27] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[28] Daniel Kudenko,et al. Using plan-based reward shaping to learn strategies in StarCraft: Broodwar , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).
[29] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[30] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[31] Sam Devlin,et al. Overcoming incorrect knowledge in plan-based reward shaping , 2016, The Knowledge Engineering Review.