论文信息 - Reinforcement Learning for MDPs with Constraints

Reinforcement Learning for MDPs with Constraints

In this article, I will consider Markov Decision Processes with two criteria, each defined as the expected value of an infinite horizon cumulative return. The second criterion is either itself subject to an inequality constraint, or there is maximum allowable probability that the single returns violate the constraint. I describe and discuss three new reinforcement learning approaches for solving such control problems.

Peter Geibel | Peter Geibel

[1] Eugene A. Feinberg,et al. Constrained Markov Decision Models with Weighted Discounted Rewards , 1995, Math. Oper. Res..

[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3] Edmund H. Durfee,et al. Approximating Optimal Policies for Agents with Limited Execution Resources , 2003, IJCAI.

[4] E. Altman. Constrained Markov Decision Processes , 1999 .

[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[6] Csaba Szepesvári,et al. Multi-criteria Reinforcement Learning , 1998, ICML.

[7] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[8] Fritz Wysotzki,et al. Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..

[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10] Edmund H. Durfee,et al. Constructing optimal policies for agents with constrained architectures , 2003, AAMAS '03.