Correction: Landmark based guidance for reinforcement learning agents under partial observability

[1]  Faruk Polat,et al.  Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states , 2022, Future Gener. Comput. Syst..

[2]  Ye Yuan,et al.  Principled reward shaping for reinforcement learning via lyapunov stability theory , 2020, Neurocomputing.

[3]  Guillaume Perrin,et al.  Adaptive early classification of temporal sequences using deep reinforcement learning , 2020, Knowl. Based Syst..

[4]  Ofir Marom,et al.  Belief Reward Shaping in Reinforcement Learning , 2018, AAAI.

[5]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[6]  D. Kudenko,et al.  Potential-based reward shaping for finite horizon online POMDP planning , 2016, Autonomous Agents and Multi-Agent Systems.

[7]  Eric Wiewiora,et al.  Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[8]  Sidney Nascimento Givigi,et al.  Policy Invariance under Reward Transformations for General-Sum Stochastic Games , 2011, J. Artif. Intell. Res..

[9]  Daniel Kudenko,et al.  Online learning of shaping rewards in reinforcement learning , 2010, Neural Networks.

[10]  Bhaskara Marthi,et al.  Automatic shaping and decomposition of reward functions , 2007, ICML '07.

[11]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[13]  William Zhu,et al.  Anchor: The achieved goal to replace the subgoal for hierarchical reinforcement learning , 2021, Knowl. Based Syst..

[14]  William W. Cohen,et al.  Machine Learning, Proceedings of the Eleventh International Conference, Rutgers University, New Brunswick, NJ, USA, July 10-13, 1994 , 1994, ICML.