Landmark based guidance for reinforcement learning agents under partial observability