A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot

We address the problem of online path planning for optimal sensing with a mobile robot. The objective of the robot is to learn the most about its pose and the environment given time constraints. We use a POMDP with a utility function that depends on the belief state to model the finite horizon planning problem. We replan as the robot progresses throughout the environment. The POMDP is high-dimensional, continuous, non-differentiable, nonlinear, non-Gaussian and must be solved in real-time. Most existing techniques for stochastic planning and reinforcement learning are therefore inapplicable. To solve this extremely complex problem, we propose a Bayesian optimization method that dynamically trades off exploration (minimizing uncertainty in unknown parts of the policy space) and exploitation (capitalizing on the current best solution). We demonstrate our approach with a visually-guide mobile robot. The solution proposed here is also applicable to other closely-related domains, including active vision, sequential experimental design, dynamic sensing and calibration with mobile sensors.

[1]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[2]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[3]  Alan J. Mayne,et al.  Towards Global Optimisation 2 , 1976 .

[4]  Lamberto Cesari,et al.  Optimization-Theory And Applications , 1983 .

[5]  C. D. Perttunen,et al.  Lipschitzian optimization without the Lipschitz constant , 1993 .

[6]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[7]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[8]  Marco Locatelli,et al.  Bayesian Algorithms for One-Dimensional Global Optimization , 1997, J. Glob. Optim..

[9]  Donald R. Jones,et al.  Global versus local search in constrained optimization of computer models , 1998 .

[10]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[11]  J. Cadre,et al.  Optimal observer trajectory in bearings-only tracking for manoeuvring sources , 1999 .

[12]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[13]  Niclas Bergman,et al.  Recursive Bayesian Estimation : Navigation and Tracking Applications , 1999 .

[14]  Hirokazu Kato,et al.  Marker tracking and HMD calibration for a video-based augmented reality conferencing system , 1999, Proceedings 2nd IEEE and ACM International Workshop on Augmented Reality (IWAR'99).

[15]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[16]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[17]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[18]  J. Cadre,et al.  Planification for Terrain- Aided Navigation , 2002 .

[19]  Michael James Sasena,et al.  Flexibility and efficiency enhancements for constrained global design optimization with kriging approximations. , 2002 .

[20]  Jan M. Maciejowski,et al.  Predictive control : with constraints , 2002 .

[21]  A. ilinskas,et al.  Global optimization based on a statistical model and simplicial partitioning , 2002 .

[22]  D. Finkel,et al.  Direct optimization algorithm user guide , 2003 .

[23]  Y. Bar-Shalom,et al.  Multisensor resource deployment using posterior Cramer-Rao bounds , 2004, IEEE Transactions on Aerospace and Electronic Systems.

[24]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[25]  Marcel L. Hernandez,et al.  Optimal Sensor Trajectories in Bearings-Only Tracking , 2004 .

[26]  Nicholas Roy,et al.  Global A-Optimal Robot Exploration in SLAM , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[27]  Robin J. Evans,et al.  Simulation-Based Optimal Sensor Scheduling with Application to Observer Trajectory Planning , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[28]  Wolfram Burgard,et al.  Information Gain-based Exploration Using Rao-Blackwellized Particle Filters , 2005, Robotics: Science and Systems.

[29]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[30]  Gamini Dissanayake,et al.  Trajectory planning for multiple robots in bearing-only target localisation , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Arnaud Doucet,et al.  SMC Samplers for Bayesian Optimal Nonlinear Design , 2006, 2006 IEEE Nonlinear Statistical Signal Processing Workshop.

[32]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[33]  Hugh Durrant-Whyte,et al.  Simultaneous Localisation and Mapping ( SLAM ) : Part I The Essential Algorithms , 2006 .

[34]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  Giorgio Metta,et al.  YARP: Yet Another Robot Platform , 2006 .

[36]  Hugh Durrant-Whyte,et al.  Simultaneous localization and mapping (SLAM): part II , 2006 .

[37]  Eduardo Mario Nebot,et al.  Consistency of the EKF-SLAM Algorithm , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[38]  Teresa A. Vidal-Calleja,et al.  Active control for single camera SLAM , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[39]  Nando de Freitas,et al.  Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm: Marginal-SLAM , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[40]  Andreas Krause,et al.  Efficient Planning of Informative Paths for Multiple Robots , 2006, IJCAI.

[41]  Nando de Freitas,et al.  Active Policy Learning for Robot Planning and Exploration under Uncertainty , 2007, Robotics: Science and Systems.

[42]  Julien Bect,et al.  On the convergence of the expected improvement algorithm , 2007 .

[43]  Nando de Freitas,et al.  Active Preference Learning with Discrete Choice Data , 2007, NIPS.

[44]  E. Vázquez,et al.  Convergence properties of the expected improvement algorithm , 2007, 0712.3744.

[45]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[46]  Ruben Martinez-Cantin Active Map Learning for Robots: Insights into Statistical Consistency , 2008 .

[47]  S. Sukkarieh,et al.  Observability analysis and active control for airborne SLAM , 2008, IEEE Transactions on Aerospace and Electronic Systems.

[48]  D. Lizotte Practical bayesian optimization , 2008 .

[49]  Nicholas Roy,et al.  Trajectory Optimization using Reinforcement Learning for Map Exploration , 2008, Int. J. Robotics Res..

[50]  KasabovNikola,et al.  2008 Special issue , 2008 .

[51]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[52]  Andreas Krause,et al.  Efficient Informative Sensing using Multiple Robots , 2014, J. Artif. Intell. Res..

[53]  Sethu Vijayakumar,et al.  A novel method for learning policies from variable constraint data , 2009, Auton. Robots.

[54]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[55]  Gregory Dudek,et al.  Inferring a probability distribution function for the pose of a sensor network using a mobile robot , 2009, 2009 IEEE International Conference on Robotics and Automation.

[56]  Marc Toussaint,et al.  Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.

[57]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[58]  Sethu Vijayakumar,et al.  A novel method for learning policies from constrained motion , 2009, 2009 IEEE International Conference on Robotics and Automation.

[59]  Christopher G. Atkeson,et al.  Finding and transferring policies using stored behaviors , 2010, Auton. Robots.