Robust Adaptive Markov Decision Processes: Planning with Model Uncertainty

The ability of autonomous systems to make complex decisions is becoming an increasingly commonplace requirement for many cooperative control operations, including the management of teams of robots such as unmanned aerial vehicles (UAV). Central to this research is the requirement to optimize the vehicle decisions, such as route planning and allocation of team resources, while operating in a dynamic and uncertain environment. Even with the advent of increasingly sophisticated vehicle sensors that can improve the information about the surroundings, uncertainty remains a ubiquitous feature of UAV applications and a key issue in UAV research.

[1]  A. Shwartz,et al.  Handbook of Markov decision processes : methods and applications , 2002 .

[2]  Jonathan P. How,et al.  Experimental Demonstration of Adaptive MDP-Based Planning with Model Uncertainty , 2008 .

[3]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[4]  Patrick Fabiani,et al.  Adapting an MDP planner to time-dependency: case study on a UAV coordination problem , 2009 .

[5]  J.P. How,et al.  Robust Markov Decision Processes using Sigma Point sampling , 2008, 2008 American Control Conference.

[6]  R. W. Miller,et al.  Asymptotic behavior of the Kalman filter with exponential aging , 1971 .

[7]  Daniel Hernández-Hernández,et al.  Risk Sensitive Markov Decision Processes , 1997 .

[8]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[9]  J. Rosenthal,et al.  Finding Generators for Markov Chains via Empirical Transition Matrices, with Applications to Credit Ratings , 2001 .

[10]  Abhijit Sinha,et al.  Collaborative distributed sensor management for multitarget tracking using hierarchical Markov decision processes , 2007, SPIE Optical Engineering + Applications.

[11]  Laurent El Ghaoui,et al.  Robust Solutions to Markov Decision Problems with Uncertain Transition Matrices , 2005 .

[12]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[13]  Abhijit Deshmukh,et al.  Performance prediction of an unmanned airborne vehicle multi-agent system , 2006, Eur. J. Oper. Res..

[14]  John N. Tsitsiklis,et al.  Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..

[15]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[16]  Chelsea C. White,et al.  Markov Decision Processes with Imprecise Transition Probabilities , 1994, Oper. Res..

[17]  Haili Song,et al.  Optimal electricity supply bidding by Markov decision process , 2000 .

[18]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[19]  X. R. Li,et al.  Online Bayesian estimation of transition probabilities for Markovian jump systems , 2004, IEEE Transactions on Signal Processing.

[20]  Yong Liu,et al.  Pop-up threat models for persistent area denial , 2007, IEEE Transactions on Aerospace and Electronic Systems.

[21]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[22]  Doina Precup,et al.  Learning in non-stationary Partially Observable Markov Decision Processes , 2005 .

[23]  Andrew G. Barto,et al.  Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.

[24]  Luca Francesco Bertuccelli,et al.  Robust decision-making with model uncertainty in aerospace systems , 2008 .

[25]  Shie Mannor,et al.  Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..

[26]  Thia Kirubarajan,et al.  Estimation with Applications to Tracking and Navigation: Theory, Algorithms and Software , 2001 .

[27]  B. Bethke,et al.  Group health management of UAV teams with applications to persistent surveillance , 2008, 2008 American Control Conference.

[28]  Dieter Fox,et al.  Adapting the Sample Size in Particle Filters Through KLD-Sampling , 2003, Int. J. Robotics Res..

[29]  Jeffrey K. Uhlmann,et al.  Unscented filtering and nonlinear estimation , 2004, Proceedings of the IEEE.