Landmark Based Reward Shaping in Reinforcement Learning with Hidden States
暂无分享,去创建一个
[1] M. Grzes,et al. Plan-based reward shaping for reinforcement learning , 2008, 2008 4th International IEEE Conference Intelligent Systems.
[2] Sam Devlin,et al. Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.
[3] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.
[4] Yunlong Liu,et al. Predictive State Representations with State Space Partitioning , 2015, AAMAS.
[5] Dana H. Ballard,et al. Learning to perceive and act by trial and error , 1991, Machine Learning.
[6] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[7] Risto Ritala,et al. Optimizing gaze direction in a visual navigation task , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[8] Marek Grzes,et al. Reward Shaping in Episodic Reinforcement Learning , 2017, AAMAS.
[9] Yang Gao,et al. Potential Based Reward Shaping for Hierarchical Reinforcement Learning , 2015, IJCAI.
[10] Bhaskara Marthi,et al. Automatic shaping and decomposition of reward functions , 2007, ICML '07.
[11] Daniel Kudenko,et al. Online learning of shaping rewards in reinforcement learning , 2010, Neural Networks.
[12] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[13] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[14] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.
[15] Lutz Frommberger,et al. Representing and Selecting Landmarks in Autonomous Learning of Robot Navigation , 2008, ICIRA.
[16] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[17] Sam Devlin,et al. Plan-based reward shaping for multi-agent reinforcement learning , 2016, The Knowledge Engineering Review.
[18] Sam Devlin,et al. Potential-based reward shaping for finite horizon online POMDP planning , 2015, Autonomous Agents and Multi-Agent Systems.
[19] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[20] Susanne Biundo-Stephan,et al. Improving Hierarchical Planning Performance by the Use of Landmarks , 2012, AAAI.
[21] Patrik Haslum,et al. Temporal Landmarks: What Must Happen, and When , 2015, ICAPS.
[22] Ofir Marom,et al. Belief Reward Shaping in Reinforcement Learning , 2018, AAAI.
[23] Michael R. James,et al. SarsaLandmark: an algorithm for learning in POMDPs with landmarks , 2009, AAMAS.