暂无分享,去创建一个
Sandy H. Huang | Raia Hadsell | Martin A. Riedmiller | Abbas Abdolmaleki | Jost Tobias Springenberg | Nicolas Heess | Martin Riedmiller | Konstantinos Bousmalis | Csaba Szepesvari | TB Dhruva | Shruti Mishra | Giulia Vezzani | Arunkumar Byravan | Bobak Shahriari | Andras Gyorgy | R. Hadsell | N. Heess | Csaba Szepesvari | A. Abdolmaleki | Arunkumar Byravan | Konstantinos Bousmalis | TB Dhruva | Bobak Shahriari | G. Vezzani | Shruti Mishra | Andr'as Gyorgy | J. T. Springenberg
[1] M. Zuluaga,et al. ε-PAL: an active learning approach to the multi-objective optimization problem , 2016 .
[2] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[3] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[4] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[5] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.
[6] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[7] S. Levine,et al. Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.
[8] David Levine,et al. Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning , 2007, NIPS.
[9] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[10] Runzhe Yang,et al. A Generalized Algorithm for Multi-Objective RL and Policy Adaptation , 2019 .
[11] Sergio Gomez Colmenarejo,et al. Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.
[12] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Bernhard Sendhoff,et al. On Test Functions for Evolutionary Multi-objective Optimization , 2004, PPSN.
[15] Wojciech Matusik,et al. Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control , 2020, ICML.
[16] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[17] Siddhartha Srinivasa,et al. Imitation Learning as f-Divergence Minimization , 2019, WAFR.
[18] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[19] Andrew Zisserman,et al. Kickstarting Deep Reinforcement Learning , 2018, ArXiv.
[20] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[21] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.
[22] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[23] David Silver,et al. Online and Offline Reinforcement Learning by Planning with a Learned Model , 2021, NeurIPS.
[24] Sergio Gomez Colmenarejo,et al. RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning , 2020 .
[25] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.
[26] Jan Peters,et al. Manifold-based multi-objective policy search with sample reuse , 2017, Neurocomputing.
[27] Qiuyi Zhang,et al. Random Hypervolume Scalarizations for Provable Multi-Objective Black Box Optimization , 2020, ICML.
[28] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[29] Stuart J. Russell,et al. Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.
[30] H. Francis Song,et al. A Distributional View on Multi-Objective Policy Optimization , 2020, ICML.
[31] Sergey Levine,et al. DeepMimic , 2018, ACM Trans. Graph..
[32] Jackie Kay,et al. Learning Dexterous Manipulation from Suboptimal Experts , 2020, ArXiv.
[33] Yee Whye Teh,et al. Information asymmetry in KL-regularized RL , 2019, ICLR.
[34] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[35] Evan Dekker,et al. Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.
[36] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[37] Nando de Freitas,et al. Critic Regularized Regression , 2020, NeurIPS.
[38] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[39] Ann Nowé,et al. Scalarized multi-objective reinforcement learning: Novel design techniques , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[40] Ann Nowé,et al. Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..
[41] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[42] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[43] Marcello Restelli,et al. Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation , 2016, J. Artif. Intell. Res..
[44] J. Dennis,et al. A closer look at drawbacks of minimizing weighted sums of objectives for Pareto set generation in multicriteria optimization problems , 1997 .
[45] Sergey Levine,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[46] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.
[47] Sonia Chernova,et al. Reinforcement Learning from Demonstration through Shaping , 2015, IJCAI.
[48] Shimon Whiteson,et al. A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..
[49] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[50] Nicolas Le Roux,et al. An operator view of policy gradient methods , 2020, NeurIPS.