暂无分享,去创建一个
[1] Stefan Ultes,et al. Reward-Balancing for Statistical Spoken Dialogue Systems using Multi-objective Reinforcement Learning , 2017, SIGDIAL Conference.
[2] Navdeep Jaitly,et al. Discrete Sequential Prediction of Continuous Actions for Deep RL , 2017, ArXiv.
[3] Andrea Castelletti,et al. Multi-objective fitted Q-iteration: Pareto frontier approximation in one single run , 2011, 2011 International Conference on Networking, Sensing and Control.
[4] Tuomas Sandholm,et al. Preference elicitation in combinatorial auctions , 2002, EC '01.
[5] W. A. Kirk,et al. An Introduction to Metric Spaces and Fixed Point Theory , 2001 .
[6] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[7] Li Chen,et al. Survey of Preference Elicitation Methods , 2004 .
[8] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[9] Ann Nowé,et al. Scalarized multi-objective reinforcement learning: Novel design techniques , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[10] Hirotaka Nakayama,et al. Sequential Approximate Multiobjective Optimization Using Computational Intelligence , 2009, Vector Optimization.
[11] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[12] Shimon Whiteson,et al. Multi-Objective Deep Reinforcement Learning , 2016, ArXiv.
[13] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[14] Dewen Hu,et al. Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.
[15] Dimitri P. Bertsekas,et al. Abstract Dynamic Programming , 2013 .
[16] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.
[17] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[18] L. Watson,et al. Modern homotopy methods in optimization , 1989 .
[19] Manabu Yoshida,et al. Parallel reinforcement learning for weighted multi-criteria model with adaptive margin , 2007, Cognitive Neurodynamics.
[20] Evan Dekker,et al. Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.
[21] David W. Coit,et al. Multi-objective optimization using genetic algorithms: A tutorial , 2006, Reliab. Eng. Syst. Saf..
[22] Dimitri P. Bertsekas,et al. Regular Policies in Abstract Dynamic Programming , 2016, SIAM J. Optim..
[23] Shie Mannor,et al. The Steering Approach for Multi-Criteria Reinforcement Learning , 2001, NIPS.
[24] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[25] Yasuaki Kuroe,et al. Multi-objective reinforcement learning for acquiring all Pareto optimal policies simultaneously - Method of determining scalarization weights , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).
[26] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[27] I. Kim,et al. Adaptive weighted sum method for multiobjective optimization: a new method for Pareto front generation , 2006 .
[28] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[29] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[30] Sriraam Natarajan,et al. Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.
[31] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[32] JiGuan G. Lin. On min-norm and min-max methods of multi-objective optimization , 2005, Math. Program..
[33] Andrea Castelletti,et al. Tree-based Fitted Q-iteration for Multi-Objective Markov Decision problems , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).
[34] Steve J. Young,et al. The Hidden Agenda User Simulation Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[35] Daphne Koller,et al. Learning an Agent's Utility Function by Observing Behavior , 2001, ICML.
[36] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[37] David Vandyke,et al. PyDial: A Multi-domain Statistical Dialogue System Toolkit , 2017, ACL.
[38] Jan Peters,et al. Manifold-based multi-objective policy search with sample reuse , 2017, Neurocomputing.
[39] Craig Boutilier,et al. A POMDP formulation of preference elicitation problems , 2002, AAAI/IAAI.
[40] David Levine,et al. Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning , 2007, NIPS.
[41] Marcello Restelli,et al. Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation , 2014, AAAI.
[42] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[43] Arch W. Naylor,et al. Linear Operator Theory in Engineering and Science , 1971 .
[44] Shimon Whiteson,et al. A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..
[45] Tom Lenaerts,et al. Dynamic Weights in Multi-Objective Deep Reinforcement Learning , 2018, ICML.
[46] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[47] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[48] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[49] Srini Narayanan,et al. Learning all optimal policies with multiple criteria , 2008, ICML '08.