Correction: Landmark based guidance for reinforcement learning agents under partial observability
暂无分享,去创建一个
[1] Faruk Polat,et al. Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states , 2022, Future Gener. Comput. Syst..
[2] Ye Yuan,et al. Principled reward shaping for reinforcement learning via lyapunov stability theory , 2020, Neurocomputing.
[3] Guillaume Perrin,et al. Adaptive early classification of temporal sequences using deep reinforcement learning , 2020, Knowl. Based Syst..
[4] Ofir Marom,et al. Belief Reward Shaping in Reinforcement Learning , 2018, AAAI.
[5] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[6] D. Kudenko,et al. Potential-based reward shaping for finite horizon online POMDP planning , 2016, Autonomous Agents and Multi-Agent Systems.
[7] Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..
[8] Sidney Nascimento Givigi,et al. Policy Invariance under Reward Transformations for General-Sum Stochastic Games , 2011, J. Artif. Intell. Res..
[9] Daniel Kudenko,et al. Online learning of shaping rewards in reinforcement learning , 2010, Neural Networks.
[10] Bhaskara Marthi,et al. Automatic shaping and decomposition of reward functions , 2007, ICML '07.
[11] S. Hochreiter,et al. Long Short-Term Memory , 1997, Neural Computation.
[12] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[13] William Zhu,et al. Anchor: The achieved goal to replace the subgoal for hierarchical reinforcement learning , 2021, Knowl. Based Syst..
[14] William W. Cohen,et al. Machine Learning, Proceedings of the Eleventh International Conference, Rutgers University, New Brunswick, NJ, USA, July 10-13, 1994 , 1994, ICML.