The Linear Programming Approach to Reach-Avoid Problems for Markov Decision Processes

One of the most fundamental problems in Markov decision processes is analysis and control synthesis for safety and reachability specifications. We consider the stochastic reach-avoid problem, in which the objective is to synthesize a control policy to maximize the probability of reaching a target set at a given time, while staying in a safe set at all prior times. We characterize the solution to this problem through an infinite dimensional linear program. We then develop a tractable approximation to the infinite dimensional linear program through finite dimensional approximations of the decision space and constraints. For a large class of Markov decision processes modeled by Gaussian mixtures kernels we show that through a proper selection of the finite dimensional space, one can further reduce the computational complexity of the resulting linear program. We validate the proposed method and analyze its potential with a series of numerical case studies.

[1]  Kenneth O. Kortanek,et al.  Semi-Infinite Programming: Theory, Methods, and Applications , 1993, SIAM Rev..

[2]  Arkadi Nemirovski,et al.  Robust optimization – methodology and applications , 2002, Math. Program..

[3]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[4]  John Lygeros,et al.  Stochastic reachability for discrete time systems: an application to aircraft collision avoidance , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[5]  L. Brown,et al.  Measurable Selections of Extrema , 1973 .

[6]  Mausam,et al.  A Theory of Goal-Oriented MDPs with Dead Ends , 2012, UAI.

[7]  John Lygeros,et al.  Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems , 2008, Autom..

[8]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[9]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[10]  James D. Keeler,et al.  Layered Neural Networks with Gaussian Hidden Units as Universal Approximations , 1990, Neural Computation.

[11]  John Lygeros,et al.  On the Connection Between Compression Learning and Scenario Based Single-Stage and Cascading Optimization Problems , 2015, IEEE Transactions on Automatic Control.

[12]  John Lygeros,et al.  Approximate dynamic programming for stochastic reachability , 2013, 2013 European Control Conference (ECC).

[13]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[14]  J. Quadrat Numerical methods for stochastic control problems in continuous time , 1994 .

[15]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[16]  Milos Hauskrecht,et al.  Linear Program Approximations for Factored Continuous-State Markov Decision Processes , 2003, NIPS.

[17]  R. Sundaram A First Course in Optimization Theory: Optimization in ℝ n , 1996 .

[18]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[19]  Thomas G. Dietterich,et al.  Improving the Performance of Radial Basis Function Networks by Learning Center Locations , 1991, NIPS.

[20]  Nataliya Sokolovska,et al.  Continuous Upper Confidence Trees , 2011, LION.

[21]  John Lygeros,et al.  Multi-Agent Autonomous Surveillance: A Framework Based on Stochastic Reachability and Hierarchical Task Allocation , 2015 .

[22]  John Lygeros,et al.  A stochastic reachability approach to emergency building evacuation , 2013, 52nd IEEE Conference on Decision and Control.

[23]  Craig Boutilier,et al.  Greedy linear value-approximation for factored Markov decision processes , 2002, AAAI/IAAI.

[24]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[25]  N. Elia,et al.  An application of reachable set analysis in power system transient stability assessment , 2005, IEEE Power Engineering Society General Meeting, 2005.

[26]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[27]  John Lygeros,et al.  Computational Approaches to Reachability Analysis of Stochastic Hybrid Systems , 2007, HSCC.

[28]  Alberto Bemporad,et al.  Inner and outer approximations of polytopes using boxes , 2004, Comput. Geom..

[29]  John Lygeros,et al.  Stochastic system controller synthesis for reachability specifications encoded by random sets , 2013, Autom..

[30]  W. Fleming Book Review: Discrete-time Markov control processes: Basic optimality criteria , 1997 .

[31]  Scott Sanner,et al.  Real-time symbolic dynamic programming for hybrid MDPs , 2015, AAAI 2015.

[32]  Benjamin Van Roy,et al.  On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..

[33]  Hector Geffner,et al.  Heuristic Search for Generalized Stochastic Shortest Path MDPs , 2011, ICAPS.

[34]  Aude Billard,et al.  Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.

[35]  C. Guestrin,et al.  Solving Factored MDPs with Hybrid State and Action Variables , 2006, J. Artif. Intell. Res..

[36]  Onésimo Hernández-Lerma,et al.  Approximation Schemes for Infinite Linear Programs , 1998, SIAM J. Optim..

[37]  Irwin W. Sandberg,et al.  Gaussian radial basis functions and inner product spaces , 2001 .

[38]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[39]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[40]  E. Anderson,et al.  Linear programming in infinite-dimensional spaces : theory and applications , 1987 .

[41]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[42]  Alberto Bemporad,et al.  Control of systems integrating logic, dynamics, and constraints , 1999, Autom..

[43]  Irwin W. Sandberg,et al.  Gaussian Radial Basis Functions and Inner-Product Spaces , 2001, ICANN.

[44]  D. Bertsekas,et al.  Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[45]  A. Nowak Universally Measurable Strategies in Zero-Sum Stochastic Games , 1985 .

[46]  P. Goulart,et al.  Trajectory Generation for Aircraft Avoidance Maneuvers Using Online Optimization , 2011 .

[47]  John Lygeros,et al.  Control design for specifications on stochastic hybrid systems , 2013, HSCC '13.

[48]  Maria Prandini,et al.  Optimal building climate control: a solution based on nested dynamic programming and randomized optimization , 2014, 53rd IEEE Conference on Decision and Control.

[49]  Olivier Buffet,et al.  Goal Probability Analysis in Probabilistic Planning: Exploring and Enhancing the State of the Art , 2016, J. Artif. Intell. Res..

[50]  E. Bronstein Approximation of convex sets by polytopes , 2008 .

[51]  Maria Prandini,et al.  An approximate linear programming solution to the probabilistic invariance problem for stochastic hybrid systems , 2014, 53rd IEEE Conference on Decision and Control.

[52]  John Lygeros,et al.  Performance Bounds for the Scenario Approach and an Extension to a Class of Non-Convex Programs , 2013, IEEE Transactions on Automatic Control.

[53]  Milos Hauskrecht,et al.  Learning Basis Functions in Hybrid Domains , 2006, AAAI.

[54]  Marco C. Campi,et al.  The Exact Feasibility of Randomized Solutions of Uncertain Convex Programs , 2008, SIAM J. Optim..

[55]  John Lygeros,et al.  Verification of discrete time stochastic hybrid systems: A stochastic reach-avoid decision problem , 2010, Autom..

[56]  Manfred Morari,et al.  Optimization‐based autonomous racing of 1:43 scale RC cars , 2015, ArXiv.

[57]  A. Shwartz,et al.  Handbook of Markov decision processes : methods and applications , 2002 .

[58]  John Lygeros,et al.  A stochastic games framework for verification and control of discrete time stochastic hybrid systems , 2013, Autom..

[59]  Jur P. van den Berg,et al.  Anytime path planning and replanning in dynamic environments , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[60]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[61]  John Lygeros,et al.  Approximation of Constrained Average Cost Markov Control Processes , 2014, 53rd IEEE Conference on Decision and Control.

[62]  John Lygeros,et al.  On the computational complexity and generalization properties of multi-stage and stage-wise coupled scenario programs , 2016, Syst. Control. Lett..

[63]  Constantine Caramanis,et al.  Theory and Applications of Robust Optimization , 2010, SIAM Rev..

[64]  Giuseppe Carlo Calafiore,et al.  The scenario approach to robust control design , 2006, IEEE Transactions on Automatic Control.

[65]  M. Campi,et al.  The scenario approach for systems and control design , 2008 .

[66]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[67]  Yoram Koren,et al.  The vector field histogram-fast obstacle avoidance for mobile robots , 1991, IEEE Trans. Robotics Autom..

[68]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[69]  Pieter Abbeel,et al.  Inverse Reinforcement Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[70]  Maria Prandini,et al.  Stochastic Reachability: Theory and Numerical Approximation , 2006 .

[71]  Jonathan P. How,et al.  Aircraft trajectory planning with collision avoidance using mixed integer linear programming , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[72]  J. W. Nieuwenhuis,et al.  Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[73]  D. Mayne,et al.  Robust Time Optimal Obstacle Avoidance Problem for Constrained Discrete Time Systems , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.