论文信息 - Multi-criteria Reinforcement Learning

Multi-criteria Reinforcement Learning

Ve cOllf:iider multi- criteria f:iequent,ial decision making problems where the vcctor-"valucd evaluations arc compared by a given, fixed total ordering. Condit.ions for the opt.irnality of statiOIl<-l,r}' p()lichs ;-weI the Bellman opti malit,y equatio n a re given for a. speci al, but. important class of problems ''v hell the eval uation of policies can be computed for the criteria, independently of each other. The anal)'sis requires special cafC as t.he t.opol ag,Y int.roduced by polnL\visc convergence a.ncl the or<1cr- topology introduced by the prefer ence order arc in general incompatible. Re inf orcement. learning algorithms are proposed and analY7,ed.

[1] L. G. Mitten. Composition Principles for Synthesis of Optimal Multistage Processes , 1964 .

[2] T. A. Brown,et al. Dynamic Programming in Multiplicative Lattices , 1965 .

[3] E. Denardo. CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .

[4] E. Frid. On Optimal Strategies in Control Problems with Constraints , 1972 .

[5] D. Bertsekas. Monotone Mappings with Application in Dynamic Programming , 1977 .

[6] M. I. Henig. Vector-Valued Dynamic Programming , 1983 .

[7] E. Altman,et al. Adaptive control of constrained Markov chains: Criteria and policies , 1991 .

[8] Peter A. Streufert. Ordinal Dynamic Programming , 1991 .

[9] Sebastian Thrun,et al. The role of exploration in learning control , 1992 .

[10] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11] Minoru Asada,et al. Coordination of multiple behaviors acquired by a vision-based reinforcement learning , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[12] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.

[13] Eugene A. Feinberg,et al. Constrained Markov Decision Models with Weighted Discounted Rewards , 1995, Math. Oper. Res..

[14] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[15] Matthias Heger. The Loss from Imperfect Value Functions in Expectation-Based and Minimax-Based Tasks , 1996, Machine Learning.

[16] Satinder P. Singh,et al. How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[17] Csaba Szepesvári. Non-Markovian Policies in Sequential Decision Problems , 1998, Acta Cybern..