Making Decisions with Spatially and Temporally Uncertain Data

We consider a decision-making problem where the environment varies both in space and time. Such problems arise naturally when considering e.g., the navigation of an underwater robot amidst ocean currents or the navigation of an aerial vehicle in wind. To model such spatiotemporal variation, we extend the standard Markov Decision Process (MDP) to a new framework called the Time-Varying Markov Decision Process (TVMDP). The TVMDP has a time-varying state transition model and transforms the standard MDP that considers only {\em immediate} and {\em static} uncertainty descriptions of state transitions, to a framework that is able to adapt to future time-varying uncertainty over some horizon. We show how to solve a TVMDP via a redesign of the MDP value propagation mechanisms by incorporating the introduced dynamics along the temporal dimension. We validate our framework in a marine robotics navigation setting using real spatiotemporal ocean data and show that it outperforms prior efforts to explicitly accommodate time by including it in the state.

[1]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  Patrick Fabiani,et al.  TiMDPpoly: An Improved Method for Solving Time-Dependent MDPs , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.

[4]  P. Dayan,et al.  Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.

[5]  Lino Marques,et al.  Robots for Environmental Monitoring: Significant Advancements and Applications , 2012, IEEE Robotics & Automation Magazine.

[6]  Lydia E. Kavraki,et al.  Probabilistic roadmaps for path planning in high-dimensional configuration spaces , 1996, IEEE Trans. Robotics Autom..

[7]  A. Siegert On the First Passage Time Probability Problem , 1951 .

[8]  Steven M. LaValle,et al.  Simplicial dijkstra and A* algorithms for optimal feedback planning , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Paolo Dario,et al.  The DustBot System: Using Mobile Robots to Monitor Pollution in Pedestrian Area , 2010 .

[10]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11]  Gilles Dufrénot,et al.  Using time-varying transition probabilities in Markov switching processes to adjust US fiscal policy for asset prices , 2013 .

[12]  R. L. Winkler Combining Probability Distributions from Dependent Information Sources , 1981 .

[13]  Suresh P. Sethi,et al.  A theory of rolling horizon decision making , 1991, Ann. Oper. Res..

[14]  David Hsu,et al.  Motion planning under uncertainty for robotic tasks with long time horizons , 2010, Int. J. Robotics Res..

[15]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[16]  Rustam Stolkin,et al.  Optimal AUV path planning for extended missions in complex, fast-flowing estuarine environments , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[17]  A. Monfort,et al.  Switching Varma Term Structure Models - Extended Version , 2006 .

[18]  Emilio Frazzoli,et al.  Anytime Motion Planning using the RRT* , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[20]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[21]  A. Kolmogoroff Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung , 1931 .

[22]  Scott Sanner,et al.  Symbolic Dynamic Programming for Discrete and Continuous State MDPs , 2011, UAI.

[23]  David Hsu,et al.  Motion planning under uncertainty for robotic tasks with long time horizons , 2010, Int. J. Robotics Res..

[24]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[25]  K. Assmann,et al.  Glider observations of the Dotson Ice Shelf outflow , 2016 .

[26]  Silvia Coradeschi,et al.  Towards environmental monitoring with mobile robots , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Gaurav S. Sukhatme,et al.  Toward risk aware mission planning for Autonomous Underwater Vehicles , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  Ron Alterovitz,et al.  Motion planning under uncertainty using iterative local optimization in belief space , 2012, Int. J. Robotics Res..

[29]  Naomi Ehrich Leonard,et al.  Adaptive Sampling Using Feedback Control of an Autonomous Underwater Glider Fleet , 2003 .

[30]  Gaurav S. Sukhatme,et al.  Planning and Implementing Trajectories for Autonomous Underwater Vehicles to Track Evolving Ocean Processes Based on Predictions from a Regional Ocean Model , 2010, Int. J. Robotics Res..

[31]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[32]  G R Heath,et al.  Marine geology. , 1982, Science.

[33]  Gaurav S. Sukhatme,et al.  Persistent ocean monitoring with underwater gliders: Adapting sampling resolution , 2011, J. Field Robotics.

[34]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[35]  H. Singh,et al.  Integrating in-situ chemical sampling with AUV control systems , 2004, Oceans '04 MTS/IEEE Techno-Ocean '04 (IEEE Cat. No.04CH37600).

[36]  B. Bett,et al.  Autonomous Underwater Vehicles (AUVs): Their past, present and future contributions to the advancement of marine geoscience , 2014 .

[37]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[38]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[39]  B. Faverjon,et al.  Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[40]  James V. Zidek,et al.  Time-Varying Markov Models for Binary Temperature Series in Agrorisk Management , 2012 .

[41]  Lihong Li,et al.  PAC model-free reinforcement learning , 2006, ICML.

[42]  Li Wei A Guide to Heuristic-based Path Planning , 2006 .

[43]  Leslie Pack Kaelbling,et al.  Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[44]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[45]  Geoffrey A. Hollinger,et al.  Risk‐aware Path Planning for Autonomous Underwater Vehicles using Predictive Ocean Models , 2013, J. Field Robotics.

[46]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[47]  Sheldon M. Ross,et al.  Introduction to Stochastic Dynamic Programming: Probability and Mathematical , 1983 .

[48]  Douglas Aberdeen,et al.  Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.

[49]  Rim Khemiri,et al.  Exchange Rate Pass-Through and Inflation Dynamics in Tunisia: A Markov-Switching Approach , 2013 .

[50]  Anthony Stentz,et al.  A Guide to Heuristic-based Path Planning , 2005 .

[51]  S. Joe Qin,et al.  A survey of industrial model predictive control technology , 2003 .

[52]  Vincent G. Ambrosia,et al.  Unmanned Aircraft Systems in Remote Sensing and Scientific Research: Classification and Considerations of Use , 2012, Remote. Sens..

[53]  Andreas Terzis,et al.  Using mobile robots to harvest data from sensor fields , 2009, IEEE Wireless Communications.

[54]  Roberto Montemanni,et al.  Time dependent vehicle routing problem with a multi ant colony system , 2008, Eur. J. Oper. Res..

[55]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[56]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[57]  Csaba Szepesvári,et al.  Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.

[58]  V. Teles,et al.  A TIME-VARYING MARKOV-SWITCHING MODEL FOR ECONOMIC GROWTH , 2015, Macroeconomic Dynamics.

[59]  Pratap Tokekar,et al.  A robotic system for monitoring carp in Minnesota lakes , 2010, J. Field Robotics.

[60]  Alexander F. Shchepetkin,et al.  The regional oceanic modeling system (ROMS): a split-explicit, free-surface, topography-following-coordinate oceanic model , 2005 .

[61]  Naomi Ehrich Leonard,et al.  Coordinated control of an underwater glider fleet in an adaptive ocean sampling field experiment in Monterey Bay , 2010, J. Field Robotics.

[62]  Francisco Blasques,et al.  Time‐Varying Transition Probabilities for Markov Regime Switching Models , 2017 .

[63]  I. Colomina,et al.  Unmanned aerial systems for photogrammetry and remote sensing: A review , 2014 .

[64]  T. Gneiting,et al.  Combining probability forecasts , 2010 .

[65]  Craig Boutilier,et al.  Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..

[66]  Thomas J. Walsh,et al.  Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[67]  Mark A. Moline,et al.  The evolution of a nearshore coastal observatory and the establishment of the New Jersey Shelf Observing System , 2002 .

[68]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[69]  David Hsu,et al.  Planning under Uncertainty for Robotic Tasks with Mixed Observability , 2010, Int. J. Robotics Res..

[70]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[71]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[72]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[73]  Michael L. Littman,et al.  Exact Solutions to Time-Dependent MDPs , 2000, NIPS.

[74]  D. Koller,et al.  Planning under uncertainty in complex structured environments , 2003 .

[75]  Gaurav S. Sukhatme,et al.  Autonomous Underwater Vehicle trajectory design coupled with predictive ocean models: A case study , 2010, 2010 IEEE International Conference on Robotics and Automation.

[76]  Yan Pailhas,et al.  Path Planning for Autonomous Underwater Vehicles , 2007, IEEE Transactions on Robotics.

[77]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..