Multi-criteria Reinforcement Learning
暂无分享,去创建一个
[1] L. G. Mitten. Composition Principles for Synthesis of Optimal Multistage Processes , 1964 .
[2] T. A. Brown,et al. Dynamic Programming in Multiplicative Lattices , 1965 .
[3] E. Denardo. CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .
[4] E. Frid. On Optimal Strategies in Control Problems with Constraints , 1972 .
[5] D. Bertsekas. Monotone Mappings with Application in Dynamic Programming , 1977 .
[6] M. I. Henig. Vector-Valued Dynamic Programming , 1983 .
[7] E. Altman,et al. Adaptive control of constrained Markov chains: Criteria and policies , 1991 .
[8] Peter A. Streufert. Ordinal Dynamic Programming , 1991 .
[9] Sebastian Thrun,et al. The role of exploration in learning control , 1992 .
[10] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[11] Minoru Asada,et al. Coordination of multiple behaviors acquired by a vision-based reinforcement learning , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).
[12] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[13] Eugene A. Feinberg,et al. Constrained Markov Decision Models with Weighted Discounted Rewards , 1995, Math. Oper. Res..
[14] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[15] Matthias Heger. The Loss from Imperfect Value Functions in Expectation-Based and Minimax-Based Tasks , 1996, Machine Learning.
[16] Satinder P. Singh,et al. How to Dynamically Merge Markov Decision Processes , 1997, NIPS.
[17] Csaba Szepesvári. Non-Markovian Policies in Sequential Decision Problems , 1998, Acta Cybern..