Constructing optimal policies for agents with constrained architectures

Optimal behavior is a very desirable property of autonomous agents [13] and, as such, has received much attention over the years. However, making optimal decisions and executing optimal actions typically requires a substantial effort on the part of an agent, and in some situations the agent might lack the necessary sensory, computational, or actuating resources to carry out the optimal policy. In such cases, the agent will have to do the best it can, given its architectural constraints. We distinguish between three ways in which an agent’s architecture can affect policy optimality. An agent might have limitations that impact its ability to f rmulate, operationalize(convert to internal representation), or executean optimal policy. In this paper, we focus on agents facing the latter two types of limitations. We adopt the transient [7] constrained Markov decision problem (CMDP) framework [2, 11] in our search for optimal policies and show how gradations of increasingly constrained architectures create more complex optimization problems ranging from polynomial to NP-complete problems. We also present algorithms based on linear and integer programming that work across a range of such constrained optimization problems. The contribution of the full paper [5] is a characterization of a portion of the landscape of constrained agent architectures in terms of the complexity of finding optimal policies and algorithms for doing so. The new results that are of the most interest include the complexity proof and the algorithm for finding deterministic policies under linear execution constraints, the analysis of operationalization constraints on action utilization costs and an algorithm for approximating optimal policies that bound the probability of exceeding upper bounds on the total costs of the policy. Here we summarize this work.

[1]  Keith W. Ross,et al.  Markov Decision Processes with Sample Path Constraints: The Communicating Case , 1989, Oper. Res..

[2]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[3]  E. Altman,et al.  Adaptive control of constrained Markov chains: Criteria and policies , 1991 .

[4]  William J. Cook,et al.  Combinatorial optimization , 1997 .

[5]  Edmund H. Durfee,et al.  Satisficing strategies for resource-limited policy search in dynamic environments , 2002, AAMAS '02.

[6]  R. E. Griffith,et al.  A Nonlinear Programming Technique for the Optimization of Continuous Processing Systems , 1961 .

[7]  David J. Musliner,et al.  World Modeling for the Dynamic Construction of Real-Time Control Plans , 1995, Artif. Intell..

[8]  Ying Huang,et al.  On Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs , 1994, Math. Oper. Res..

[9]  S. Griffis EDITOR , 1997, Journal of Navigation.

[10]  F. d'Epenoux,et al.  A Probabilistic Production and Inventory Problem , 1963 .

[11]  George B. Dantzig,et al.  Linear programming and extensions , 1965 .

[12]  E. Altman Constrained Markov Decision Processes , 1999 .

[13]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..

[14]  L. Khachiyan Polynomial algorithms in linear programming , 1980 .

[15]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[16]  Alan F. Blackwell,et al.  Programming , 1973, CSC '73.

[17]  L. C. M. Kallenberg,et al.  Linear programming and finite Markovian control problems , 1984 .

[18]  David J. Musliner,et al.  CIRCA: a cooperative intelligent real-time control architecture , 1993, IEEE Trans. Syst. Man Cybern..

[19]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[20]  Devika Subramanian,et al.  Provably Bounded Optimal Agents , 1993, IJCAI.

[21]  G. Nemhauser,et al.  Integer Programming , 2020 .

[22]  Keith W. Ross,et al.  Optimal scheduling of interactive and noninteractive traffic in telecommunication systems , 1988 .

[23]  R. Bellman Dynamic programming. , 1957, Science.

[24]  V. Klee,et al.  HOW GOOD IS THE SIMPLEX ALGORITHM , 1970 .

[25]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[26]  L. G. H. Cijan A polynomial algorithm in linear programming , 1979 .

[27]  Edmund H. Durfee,et al.  Approximating Optimal Policies for Agents with Limited Execution Resources , 2003, IJCAI.

[28]  M. J. Sobel Maximal mean/standard deviation ratio in an undiscounted MDP , 1985 .

[29]  Laurence A. Wolsey,et al.  Integer and Combinatorial Optimization , 1988 .

[30]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .