Policy Search with Non-uniform State Representations for Environmental Sampling

Surveying fragile ecosystems like coral reefs is important to monitor the effects of climate change. We present an adaptive sampling technique that generates efficient trajectories covering hotspots in the region of interest at a high rate. A key feature of our sampling algorithm is the ability to generate action plans for any new hotspot distribution using the parameters learned on other similar looking distributions.

[1]  O. Hoegh‐Guldberg Climate change, coral bleaching and the future of the world's coral reefs , 1999 .

[2]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[3]  Howie Choset,et al.  Coverage Path Planning: The Boustrophedon Cellular Decomposition , 1998 .

[4]  Jens Wawerla,et al.  Fractal trajectories for online non-uniform aerial coverage , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Kian Hsiang Low,et al.  Adaptive multi-robot wide-area exploration and mapping , 2008, AAMAS.

[6]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[8]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[9]  Ioannis M. Rekleitis,et al.  Optimal complete terrain coverage using an Unmanned Aerial Vehicle , 2011, 2011 IEEE International Conference on Robotics and Automation.

[10]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[11]  Gaurav S. Sukhatme,et al.  Adaptive sampling for environmental field estimation using robotic sensors , 2004, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Yoram Bresler,et al.  Perfect reconstruction formulas and bounds on aliasing error in sub-nyquist nonuniform sampling of multiband signals , 2000, IEEE Trans. Inf. Theory.

[13]  Shlomo Zilberstein,et al.  Anytime Sensing Planning and Action: A Practical Model for Robot Control , 1993, IJCAI.

[14]  Gregory Dudek,et al.  Reinforcement Learning with Non-uniform State Representations for Adaptive Search , 2018, 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[15]  Gregory Dudek,et al.  Data-driven selective sampling for marine vehicles using multi-scale paths , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.