Approximating Optimal Policies for Agents with Limited Execution Resources

An agent with limited consumable execution resources needs policies that attempt to achieve good performance while respecting these limitations. Otherwise, an agent (such as a plane) might fail catastrophically (crash) when it runs out of resources (fuel) at the wrong time (in midair). We present a new approach to constructing policies for agents with limited execution resources that builds on principles of real-time AI, as well as research in constrained Markov decision processes. Specifically, we formulate, solve, and analyze the policy optimization problem where constraints are imposed on the probability of exceeding the resource limits. We describe and empirically evaluate our solution technique to show that it is computationally reasonable, and that it generates policies that sacrifice some potential reward in order to make the kinds of precise guarantees about the probability of resource overutilization that are crucial for mission-critical applications.

[1]  Devika Subramanian,et al.  Provably Bounded Optimal Agents , 1993, IJCAI.

[2]  James A. Hendler,et al.  The Challenges of Real-Time All , 1995, Computer.

[3]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[4]  Keith W. Ross,et al.  Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach , 1991, Math. Oper. Res..

[5]  L. C. M. Kallenberg,et al.  Linear programming and finite Markovian control problems , 1984 .

[6]  E. Altman Constrained Markov Decision Processes , 1999 .

[7]  J. Hendler,et al.  The Challenges of Real-time Ai , 1995 .

[8]  Ying Huang,et al.  On Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs , 1994, Math. Oper. Res..

[9]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[10]  Thorsten von Eicken,et al.  技術解説 IEEE Computer , 1999 .

[11]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[12]  Leslie Pack Kaelbling,et al.  Planning With Deadlines in Stochastic Domains , 1993, AAAI.

[13]  Keith W. Ross,et al.  Markov Decision Processes with Sample Path Constraints: The Communicating Case , 1989, Oper. Res..

[14]  John S. Edwards,et al.  Linear Programming and Finite Markovian Control Problems , 1983 .

[15]  Keith W. Ross,et al.  Optimal scheduling of interactive and noninteractive traffic in telecommunication systems , 1988 .

[16]  M. J. Sobel Maximal mean/standard deviation ratio in an undiscounted MDP , 1985 .