Reinforcement learning for landmark-based robot navigation

In landmark-based navigation, robots start in an unknown locationand must navigate to a desired target using visually-acquired land-marks. In the scenario that we are studying the target isvisible fromthe robot’s initial location, but it may subsequently be occludedby intervening objects. The challenge for the robot is to acquireenough information about the environment so that it can, even inthat case move, from the starting location to the target position.In this paper, we build upon our previously described multiagentsystem for outdoor landmark-based navigation [2]. It is composedof three systems: the Pilot, responsible for all motions of the robot,the Vision system, responsible for identifying and tracking land-marks and for detecting obstacles, and the Navigation system, re-sponsible for choosing high-level robot motions.These three systems must cooperate to achieve the overall task ofreaching the target. For instance, the Pilot needs the Vision systemto identify obstacles and the Navigation system to select a path tothe goal. The systems are also competing, for instance, the Pilotand the Navigation system both compete for the Vision system. ThePilot needs it for obstacle avoidance, while the Navigation systemneeds it for landmark detection and tracking.To manage this cooperation and competition, in [2] we had chosena bidding mechanism. Each system generates bids for the servicesoffered by the Pilot and Vision systems. The service actually exe-cuted by each system depends on the winning bid at each point intime. In [2] we proposed bidding functions to obtain good perfor-mance from the combined system. In this paper we use Reinforce-ment Learning (RL) [5] to tune the parameters of those functions.The Navigation system is also implemented as a multiagent sys-tem composed of six agents with the following goals: keep the tar-get located with maximum precision and reach it (Target Tracker),keep the risk of losing the target low (Risk Manager), recover from

[1]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[2]  Tony J. Prescott,et al.  Spatial Representation for Navigation in Animats , 1996, Adapt. Behav..

[3]  Dídac Busquets,et al.  Multiagent Bidding Mechanisms for Robot Qualitative Navigation , 2000, ATAL.

[4]  Elizabeth R. Stuck,et al.  Using a Blackboard to Integrate Multiple Activities and Achieve Strategic Reasoning for Mobile-Robot Navigation , 1995, IEEE Expert.

[5]  Julio Rosenblatt,et al.  DAMN: a distributed architecture for mobile navigation , 1997, J. Exp. Theor. Artif. Intell..

[6]  Anthony Stentz The CODGER System for Mobile Robot Navigation , 1990 .

[7]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[8]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[9]  Pattie Maes,et al.  The Dynamics of Action Selection , 1989, IJCAI.

[10]  M. Teresa Escrig,et al.  Autonomous robot navigation using human spatial concepts , 2000, Int. J. Intell. Syst..

[11]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[12]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[13]  Can Isik,et al.  Pilot level of a hierarchical controller for an unmanned mobile robot , 1988, IEEE J. Robotics Autom..

[14]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[15]  Tod S. Levitt,et al.  Qualitative Navigation for Mobile Robots , 1990, Artif. Intell..

[16]  Ronald C. Arkin,et al.  Motor Schema — Based Mobile Robot Navigation , 1989, Int. J. Robotics Res..