Policy invariance under reward transformations for multi-objective reinforcement learning
暂无分享,去创建一个
Sam Devlin | Jim Duggan | Patrick Mannion | Karl Mason | Enda Howley | J. Duggan | E. Howley | Sam Devlin | P. Mannion | Karl Mason
[1] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[2] Malabika Basu,et al. Dynamic economic emission dispatch using nondominated sorting genetic algorithm-II , 2008 .
[3] Jim Duggan,et al. Analysing the Effects of Reward Shaping in Multi-Objective Stochastic Games , 2017 .
[4] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[5] Srini Narayanan,et al. Learning all optimal policies with multiple criteria , 2008, ICML '08.
[6] Kagan Tumer,et al. Collective Intelligence, Data Routing and Braess' Paradox , 2002, J. Artif. Intell. Res..
[7] Sam Devlin,et al. Potential-based reward shaping for knowledge-based, multi-agent reinforcement learning , 2013 .
[8] Jim Duggan,et al. A Theoretical and Empirical Analysis of Reward Transformations in Multi-Objective Stochastic Games , 2017, AAMAS.
[9] Christian R. Shelton,et al. Importance sampling for reinforcement learning with multiple objectives , 2001 .
[10] Joseph A. Paradiso,et al. The gesture recognition toolkit , 2014, J. Mach. Learn. Res..
[11] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[12] Ann Nowé,et al. Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..
[13] John Yearwood,et al. On the Limitations of Scalarisation for Multi-objective Reinforcement Learning of Pareto Fronts , 2008, Australasian Conference on Artificial Intelligence.
[14] Kagan Tumer,et al. Distributed agent-based air traffic flow management , 2007, AAMAS '07.
[15] Bart De Schutter,et al. Multi-agent Reinforcement Learning: An Overview , 2010 .
[16] Matthew E. Taylor,et al. Multi-objectivization of reinforcement learning problems by reward shaping , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).
[17] Michael Wooldridge,et al. Introduction to multiagent systems , 2001 .
[18] Shimon Whiteson,et al. A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..
[19] Jim Duggan,et al. An Experimental Review of Reinforcement Learning Algorithms for Adaptive Traffic Signal Control , 2016, Autonomic Road Transport Support Systems.
[20] Yoav Shoham,et al. If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..
[21] Sam Devlin,et al. An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems , 2011, Adv. Complex Syst..
[22] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[23] Sam Devlin,et al. Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.
[24] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[25] Marek Grzes,et al. Reward Shaping in Episodic Reinforcement Learning , 2017, AAMAS.
[26] V. Pareto. Manual of Political Economy: A Critical and Variorum Edition , 2014 .
[27] Babita Majhi,et al. Multiobjective optimization based adaptive models with fuzzy decision making for stock market forecasting , 2015, Neurocomputing.
[28] Alice E. Smith,et al. Penalty functions , 1996 .
[29] Sam Devlin,et al. Multi-Objective Dynamic Dispatch Optimisation using Multi-Agent Reinforcement Learning: (Extended Abstract) , 2016, AAMAS.
[30] Evan Dekker,et al. Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.
[31] David H. Wolpert,et al. Collective Intelligence , 1999 .
[32] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[33] Yan Shi,et al. Multiobjective optimization technique for demand side management with load balancing approach in smart grid , 2016, Neurocomputing.
[34] Sam Devlin,et al. Dynamic potential-based reward shaping , 2012, AAMAS.