Multi-criteria Reinforcement Learning

Ve cOllf:iider multi- criteria f:iequent,ial decision making problems where the vcctor-"valucd evaluations arc compared by a given, fixed total ordering. Condit.ions for the opt.irnality of statiOIl<-l,r}' p()lichs ;-weI the Bellman opti­ malit,y equatio n a re given for a. speci al, but. important class of problems ''v hell the eval­ uation of policies can be computed for the criteria, independently of each other. The anal)'sis requires special cafC as t.he t.opol­ ag,Y int.roduced by polnL\visc convergence a.ncl the or<1cr- topology introduced by the prefer­ ence order arc in general incompatible. Re­ inf orcement. learning algorithms are proposed and analY7,ed.

[1]  L. G. Mitten Composition Principles for Synthesis of Optimal Multistage Processes , 1964 .

[2]  T. A. Brown,et al.  Dynamic Programming in Multiplicative Lattices , 1965 .

[3]  E. Denardo CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .

[4]  E. Frid On Optimal Strategies in Control Problems with Constraints , 1972 .

[5]  D. Bertsekas Monotone Mappings with Application in Dynamic Programming , 1977 .

[6]  M. I. Henig Vector-Valued Dynamic Programming , 1983 .

[7]  E. Altman,et al.  Adaptive control of constrained Markov chains: Criteria and policies , 1991 .

[8]  Peter A. Streufert Ordinal Dynamic Programming , 1991 .

[9]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[10]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11]  Minoru Asada,et al.  Coordination of multiple behaviors acquired by a vision-based reinforcement learning , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[12]  Matthias Heger,et al.  Consideration of Risk in Reinforcement Learning , 1994, ICML.

[13]  Eugene A. Feinberg,et al.  Constrained Markov Decision Models with Weighted Discounted Rewards , 1995, Math. Oper. Res..

[14]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[15]  Matthias Heger The Loss from Imperfect Value Functions in Expectation-Based and Minimax-Based Tasks , 1996, Machine Learning.

[16]  Satinder P. Singh,et al.  How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[17]  Csaba Szepesvári Non-Markovian Policies in Sequential Decision Problems , 1998, Acta Cybern..