Andhill-98: A RoboCup Team which Reinforces Positioning with Observation

On reinforcement learning with limited exploration, an agent's policy tends to fall into a worthless local optimum. This paper proposes Observational Reinforcement Learning method with which the learning agent evaluates inexperienced policies and reinforces it. This method provides the agent more chances to escape from a local optimum without exploration. Moreover, this paper shows the effectiveness of the method from experiments in the RoboCup positioning problem. They are advanced experiments described in our RoboCup-97 paper[1].