A Solution to Time-Varying Markov Decision Processes

We consider a decision-making problem where the environment varies both in space and time. Such problems arise naturally when considering, e.g., the navigation of an underwater robot amidst ocean currents or the navigation of an aerial vehicle in wind. To model such spatiotemporal variation, we extend the standard Markov decision process (MDP) to a new framework called the time-varying Markov decision process (TVMDP). The TVMDP has a time-varying state transition model and transforms the standard MDP that considers only immediate and static uncertainty descriptions of state transitions, to a framework that is able to adapt to future time-varying transition dynamics over some horizon. We show how to solve a TVMDP via a redesign of the MDP value propagation mechanisms by incorporating the introduced dynamics along the temporal dimension. We validate our framework in a marine robotics navigation setting using spatiotemporal ocean data and show that it outperforms prior efforts.

[1]  R. L. Winkler Combining Probability Distributions from Dependent Information Sources , 1981 .

[2]  James V. Zidek,et al.  Time-Varying Markov Models for Binary Temperature Series in Agrorisk Management , 2012 .

[3]  Alexander F. Shchepetkin,et al.  The regional oceanic modeling system (ROMS): a split-explicit, free-surface, topography-following-coordinate oceanic model , 2005 .

[4]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[5]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[8]  Patrick Fabiani,et al.  TiMDPpoly: An Improved Method for Solving Time-Dependent MDPs , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.

[9]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[10]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[11]  A. Kolmogoroff Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung , 1931 .

[12]  Scott Sanner,et al.  Symbolic Dynamic Programming for Discrete and Continuous State MDPs , 2011, UAI.

[13]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[14]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[15]  T. Gneiting,et al.  Combining probability forecasts , 2010 .

[16]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[17]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[18]  Roberto Montemanni,et al.  Time dependent vehicle routing problem with a multi ant colony system , 2008, Eur. J. Oper. Res..

[19]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[20]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[21]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[22]  V. Teles,et al.  A TIME-VARYING MARKOV-SWITCHING MODEL FOR ECONOMIC GROWTH , 2015, Macroeconomic Dynamics.

[23]  Michael L. Littman,et al.  Exact Solutions to Time-Dependent MDPs , 2000, NIPS.

[24]  Gilles Dufrénot,et al.  Using time-varying transition probabilities in Markov switching processes to adjust US fiscal policy for asset prices , 2013 .

[25]  A. Siegert On the First Passage Time Probability Problem , 1951 .