Multi-objective Reinforcement Learning for the Expected Utility of the Return
暂无分享,去创建一个
[1] Michèle Sebag,et al. Multi-objective Monte-Carlo Tree Search , 2012, ACML.
[2] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[4] Ann Nowé,et al. Designing multi-objective multi-armed bandits algorithms: A study , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).
[5] Peter Geibel,et al. Reinforcement Learning for MDPs with Constraints , 2006, ECML.
[6] Eyke Hüllermeier,et al. Preference-based reinforcement learning: a formal framework and a policy iteration algorithm , 2012, Mach. Learn..
[7] Peter Auer,et al. Pareto Front Identification from Stochastic Bandit Feedback , 2016, AISTATS.
[8] Shlomo Zilberstein,et al. Multi-Objective POMDPs with Lexicographic Reward Preferences , 2015, IJCAI.
[9] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[10] Marco Wiering,et al. Model-based multi-objective reinforcement learning , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[11] Ann Nowé,et al. Interactive Thompson Sampling for Multi-objective Multi-armed Bandits , 2017, ADT.
[12] Ann Nowé,et al. Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..
[13] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[14] Andrei V. Kelarev,et al. Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks , 2009, Australasian Conference on Artificial Intelligence.
[15] Peter Vamplew,et al. Steering approaches to Pareto-optimal multiobjective reinforcement learning , 2017, Neurocomputing.
[16] Michèle Sebag,et al. Preference-Based Policy Learning , 2011, ECML/PKDD.
[17] Pablo Hernandez-Leal,et al. Learning on a Budget Using Distributional RL , 2018 .
[18] Pieter Libin,et al. Interactive multi-objective reinforcement learning in multi-armed bandits for any utility function , 2020 .
[19] Shlomo Zilberstein,et al. Multi-Objective MDPs with Conditional Lexicographic Reward Preferences , 2015, AAAI.
[20] Shimon Whiteson,et al. Multi-Objective Decision Making , 2017, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[21] Peter Vamplew,et al. MORL-Glue: a benchmark suite for multi-objective reinforcement learning , 2017 .
[22] Shimon Whiteson,et al. A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..
[23] Eyke Hüllermeier,et al. Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm , 2014, Machine Learning.
[24] Michèle Sebag,et al. APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.
[25] Susan A. Murphy,et al. Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis , 2010, ICML.
[26] Peter Vrancx,et al. Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets , 2017, AAAI.
[27] Shimon Whiteson,et al. Point-Based Planning for Multi-Objective POMDPs , 2015, IJCAI.