暂无分享,去创建一个
Xingyou Song | Marius Lindauer | Aleksandra Faust | Frank Hutter | Jack Parker-Holder | Roberto Calandra | Theresa Eimer | Andr'e Biedenkapp | Yingjie Miao | Raghu Rajan | Vu Nguyen | Baohe Zhang | F. Hutter | R. Calandra | M. Lindauer | Jack Parker-Holder | Raghunandan Rajan | Xingyou Song | André Biedenkapp | Yingjie Miao | Theresa Eimer | Baohe Zhang | V. Nguyen | Aleksandra Faust
[1] Ingo Rechenberg,et al. Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .
[2] Jonas Mockus,et al. On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.
[3] Richard S. Sutton,et al. Goal Seeking Components for Adaptive Intelligence: An Initial Assessment. , 1981 .
[4] R. Geoff Dromey,et al. An algorithm for the selection problem , 1986, Softw. Pract. Exp..
[5] Manfred Morari,et al. Model predictive control: Theory and practice , 1988 .
[6] Manfred Morari,et al. Model predictive control: Theory and practice - A survey , 1989, Autom..
[7] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[8] L. Darrell Whitley,et al. Lamarckian Evolution, The Baldwin Effect and Function Optimization , 1994, PPSN.
[9] William M. Spears,et al. Adapting Crossover in Evolutionary Algorithms , 1995, Evolutionary Programming.
[10] Zbigniew Michalewicz,et al. Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.
[11] Thomas Bäck,et al. Parallel Problem Solving from Nature — PPSN V , 1998, Lecture Notes in Computer Science.
[12] Thomas Bäck,et al. An Overview of Parameter Control Methods by Self-Adaption in Evolutionary Algorithms , 1998, Fundam. Informaticae.
[13] Donald R. Jones,et al. Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..
[14] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[15] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[16] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[17] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[18] Michael Kearns,et al. Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.
[19] Michail G. Lagoudakis,et al. Reinforcement Learning for Algorithm Selection , 2000, AAAI/IAAI.
[20] Risto Miikkulainen,et al. Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.
[21] Kenji Doya,et al. Evolution of meta-parameters in reinforcement learning algorithm , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).
[22] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[23] Bartlomiej Gloger,et al. Self-adaptive Evolutionary Algorithms , 2004 .
[24] Peter Dayan,et al. Analytical Mean Squared Error Curves for Temporal Difference Learning , 1996, Machine Learning.
[25] Andrew G. Barto,et al. Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.
[26] Risto Miikkulainen,et al. Efficient Non-linear Control Through Neuroevolution , 2006, ECML.
[27] Bhaskara Marthi,et al. Automatic shaping and decomposition of reward functions , 2007, ICML '07.
[28] Charles Ofria,et al. Natural Selection Fails to Optimize Mutation Rates for Long-Term Adaptation on Rugged Fitness Landscapes , 2008, ECAL.
[29] Shimon Whiteson,et al. Generalized Domains for Empirical Evaluations in Reinforcement Learning , 2009 .
[30] F. Hutter,et al. ParamILS: An Automatic Algorithm Configuration Framework , 2009, J. Artif. Intell. Res..
[31] Charles Ofria,et al. Evolving coordinated quadruped gaits with the HyperNEAT generative encoding , 2009, 2009 IEEE Congress on Evolutionary Computation.
[32] Kenneth O. Stanley,et al. A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.
[33] Scott Sanner,et al. Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda , 2010, ICML.
[34] Shimon Whiteson,et al. Multi-task evolutionary shaping without pre-specified representations , 2010, GECCO '10.
[35] Yuri Malitsky,et al. ISAC - Instance-Specific Algorithm Configuration , 2010, ECAI.
[36] Nando de Freitas,et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.
[37] Kevin Leyton-Brown,et al. Hydra: Automatically Configuring Algorithms for Portfolio-Based Selection , 2010, AAAI.
[38] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[39] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..
[40] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[41] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[42] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[43] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[44] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[45] Sebastian Risi,et al. Confronting the challenge of learning a flexible neural controller for a diversity of morphologies , 2013, GECCO '13.
[46] Kyrre Glette,et al. Evolving Gaits for Physical Robots with the HyperNEAT Generative Encoding: The Benefits of Simulation , 2013, EvoApplications.
[47] Juan José Murillo-Fuentes,et al. Gaussian Processes for Nonlinear Signal Processing: An Overview of Recent Advances , 2013, IEEE Signal Processing Magazine.
[48] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[49] Jonathan P. How,et al. Reinforcement learning with multi-fidelity simulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).
[50] Kevin Leyton-Brown,et al. An Efficient Approach for Assessing Hyperparameter Importance , 2014, ICML.
[51] Risto Miikkulainen,et al. A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.
[52] Peter I. Frazier,et al. Bayesian optimization for materials design , 2015, 1506.01349.
[53] Shiguang Shan,et al. Self-Paced Curriculum Learning , 2015, AAAI.
[54] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[55] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[56] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[57] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[58] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[59] Kirthevasan Kandasamy,et al. Gaussian Process Bandit Optimisation with Multi-fidelity Evaluations , 2016, NIPS.
[60] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[62] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[63] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[64] Martha White,et al. A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning , 2016, AAMAS.
[65] Andrew Lewis,et al. The Whale Optimization Algorithm , 2016, Adv. Eng. Softw..
[66] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[67] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[68] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[69] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[70] Ameet Talwalkar,et al. Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.
[71] Holger H. Hoos,et al. Analysing differences between algorithm configurations through ablation , 2015, Journal of Heuristics.
[72] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[73] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[74] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[75] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[76] Balaraman Ravindran,et al. Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning , 2017, ICLR.
[77] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[78] Behnam Neyshabur,et al. Implicit Regularization in Deep Learning , 2017, ArXiv.
[79] Alán Aspuru-Guzik,et al. Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space , 2017, ICML.
[80] Ameet Talwalkar,et al. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..
[81] D. Sculley,et al. Google Vizier: A Service for Black-Box Optimization , 2017, KDD.
[82] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[83] Misha Denil,et al. Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.
[84] Kenneth O. Stanley,et al. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.
[85] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[86] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[87] Peter Henderson,et al. Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.
[88] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[89] Marius Thomas Lindauer,et al. Efficient Parameter Importance Analysis via Ablation with Surrogates , 2017, AAAI.
[90] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[91] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[92] Frank Hutter,et al. Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari , 2018, IJCAI.
[93] Marius Lindauer,et al. CAVE: Configuration Assessment, Visualization and Evaluation , 2018, LION.
[94] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[95] Nando de Freitas,et al. Bayesian Optimization in AlphaGo , 2018, ArXiv.
[96] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[97] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[98] Simon M. Lucas,et al. Evolving mario levels in the latent space of a deep convolutional generative adversarial network , 2018, GECCO.
[99] Samy Bengio,et al. A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.
[100] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.
[101] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[102] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.
[103] Shie Mannor,et al. Learning Robust Options , 2018, AAAI.
[104] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[105] Jakob N. Foerster,et al. Deep multi-agent reinforcement learning , 2018 .
[106] Ian Gibson,et al. Accelerating Experimental Design by Incorporating Experimenter Hunches , 2018, 2018 IEEE International Conference on Data Mining (ICDM).
[107] Benjamin Doerr,et al. Theory of Parameter Control for Discrete Black-Box Optimization: Provable Performance Gains Through Dynamic Parameter Choices , 2018, Theory of Evolutionary Computation.
[108] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[109] Kenneth O. Stanley,et al. ES is more than just a traditional finite-difference approximator , 2017, GECCO.
[110] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.
[111] Wouter Caarls,et al. Parameters Tuning and Optimization for Reinforcement Learning Algorithms Using Evolutionary Computing , 2018, 2018 International Conference on Information Systems and Computer Science (INCISCOS).
[112] Aaron Klein,et al. BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.
[113] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.
[114] Jan N. van Rijn,et al. Hyperparameter Importance Across Datasets , 2017, KDD.
[115] Pieter Abbeel,et al. Evolved Policy Gradients , 2018, NeurIPS.
[116] Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.
[117] Andrew Zisserman,et al. Kickstarting Deep Reinforcement Learning , 2018, ArXiv.
[118] S. Levine,et al. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.
[119] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[120] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..
[121] Carlos Riquelme,et al. Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates , 2019, NeurIPS.
[122] Adam Gaier,et al. Weight Agnostic Neural Networks , 2019, NeurIPS.
[123] Adarsh Sehgal,et al. Deep Reinforcement Learning Using Genetic Algorithm for Parameter Optimization , 2019, 2019 Third IEEE International Conference on Robotic Computing (IRC).
[124] Dong Yan,et al. Reward Shaping via Meta-Learning , 2019, ArXiv.
[125] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[126] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[127] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[128] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.
[129] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.
[130] Frank Hutter,et al. Learning to Design RNA , 2018, ICLR.
[131] Tom Schaul,et al. Adapting Behaviour for Learning Progress , 2019, ArXiv.
[132] Richard L. Lewis,et al. Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.
[133] Aleksandra Faust,et al. Learning Navigation Behaviors End-to-End With AutoRL , 2018, IEEE Robotics and Automation Letters.
[134] Risto Miikkulainen,et al. Designing neural networks through neuroevolution , 2019, Nat. Mach. Intell..
[135] Xingyou Song,et al. An Empirical Study on Hyperparameters and their Interdependence for RL Generalization , 2019, ArXiv.
[136] Marius Lindauer,et al. Pitfalls and Best Practices in Algorithm Configuration , 2017, J. Artif. Intell. Res..
[137] Shimon Whiteson,et al. Fast Efficient Hyperparameter Tuning for Policy Gradients , 2019, NeurIPS.
[138] Marius Lindauer,et al. Best Practices for Scientific Research on Neural Architecture Search , 2019, ArXiv.
[139] Nasser Mozayani,et al. Automatic construction and evaluation of macro-actions in reinforcement learning , 2019, Appl. Soft Comput..
[140] Guy Lever,et al. Emergent Coordination Through Competition , 2019, ICLR.
[141] Katja Hofmann,et al. The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors , 2019, ArXiv.
[142] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[143] Peter Stone,et al. Building Self-Play Curricula Online by Playing with Expert Agents in Adversarial Games , 2019, 2019 8th Brazilian Conference on Intelligent Systems (BRACIS).
[144] Razvan Pascanu,et al. Deep reinforcement learning with relational inductive biases , 2018, ICLR.
[145] Kenneth O. Stanley,et al. POET: open-ended coevolution of environments and their optimized solutions , 2019, GECCO.
[146] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.
[147] David Silver,et al. On Inductive Biases in Deep Reinforcement Learning , 2019, ArXiv.
[148] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.
[149] Anthony G. Francis,et al. Evolving Rewards to Automate Reinforcement Learning , 2019, ArXiv.
[150] Frank Hutter,et al. !MDP Playground: Meta-Features in Reinforcement Learning , 2019, ArXiv.
[151] José Miguel Hernández-Lobato,et al. Constrained Bayesian optimization for automatic chemical design using variational autoencoders. , 2019 .
[152] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.
[153] Taehoon Kim,et al. Quantifying Generalization in Reinforcement Learning , 2018, ICML.
[154] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.
[155] Ilya Kostrikov,et al. Automatic Data Augmentation for Generalization in Deep Reinforcement Learning , 2020, ArXiv.
[156] Junhyuk Oh,et al. A Self-Tuning Actor-Critic Algorithm , 2020, NeurIPS.
[157] K. Choromanski,et al. Effective Diversity in Population-Based Reinforcement Learning , 2020, NeurIPS.
[158] Jimmy Ba,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.
[159] Leslie Pack Kaelbling,et al. Meta-learning curiosity algorithms , 2020, ICLR.
[160] Stephen Roberts,et al. Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits , 2020, NeurIPS.
[161] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.
[162] R. Munos,et al. Adaptive Trade-Offs in Off-Policy Learning , 2019, AISTATS.
[163] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[164] Julian Togelius,et al. Fully Differentiable Procedural Content Generation through Generative Playing Networks , 2020, ArXiv.
[165] Shimon Whiteson,et al. Growing Action Spaces , 2019, ICML.
[166] Jan Peters,et al. Self-Paced Deep Reinforcement Learning , 2020, NeurIPS.
[167] Edward Grefenstette,et al. The NetHack Learning Environment , 2020, NeurIPS.
[168] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[169] Tor Lattimore,et al. Behaviour Suite for Reinforcement Learning , 2019, ICLR.
[170] Luisa M. Zintgraf,et al. VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning , 2019, ICLR.
[171] Junhyuk Oh,et al. Discovering Reinforcement Learning Algorithms , 2020, NeurIPS.
[172] Improving Generalization in Meta Reinforcement Learning using Learned Objectives , 2019, ICLR.
[173] O. Pietquin,et al. Munchausen Reinforcement Learning , 2020, NeurIPS.
[174] Yoshua Bengio,et al. Revisiting Fundamentals of Experience Replay , 2020, ICML.
[175] Razvan Pascanu,et al. Stabilizing Transformers for Reinforcement Learning , 2019, ICML.
[176] Scott M. Jordan,et al. Evaluating the Performance of Reinforcement Learning Algorithms , 2020, ICML.
[177] Lars Hertel,et al. Quantity vs. Quality: On Hyperparameter Optimization for Deep Reinforcement Learning , 2020, ArXiv.
[178] Xingyou Song,et al. Observational Overfitting in Reinforcement Learning , 2019, ICLR.
[179] Junhyuk Oh,et al. Meta-Gradient Reinforcement Learning with an Objective Discovered Online , 2020, NeurIPS.
[180] Michael A. Osborne,et al. Knowing The What But Not The Where in Bayesian Optimization , 2019, ICML.
[181] Sameera S. Ponda,et al. Autonomous navigation of stratospheric balloons using reinforcement learning , 2020, Nature.
[182] Yujin Tang,et al. Neuroevolution of self-interpretable agents , 2020, GECCO.
[183] J. Schulman,et al. Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.
[184] Joel Lehman,et al. Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions , 2020, ICML.
[185] Michael A. Osborne,et al. Bayesian Optimization for Iterative Learning , 2019, NeurIPS.
[186] Natasha Jaques,et al. Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design , 2020, NeurIPS.
[187] Daniel Guo,et al. Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.
[188] Herke van Hoof,et al. A Performance-Based Start State Curriculum Framework for Reinforcement Learning , 2020, AAMAS.
[189] Pieter Abbeel,et al. Automatic Curriculum Learning through Value Disagreement , 2020, NeurIPS.
[190] Animesh Garg,et al. D2RL: Deep Dense Architectures in Reinforcement Learning , 2020, ArXiv.
[191] Daniel R. Jiang,et al. BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization , 2020, NeurIPS.
[192] Yujing Hu,et al. Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping , 2020, NeurIPS.
[193] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[194] Andrey Kolobov,et al. Safe Reinforcement Learning via Curriculum Induction , 2020, NeurIPS.
[195] Krzysztof Choromanski,et al. Ready Policy One: World Building Through Active Learning , 2020, ICML.
[196] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[197] Matthew E. Taylor,et al. Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey , 2020, J. Mach. Learn. Res..
[198] Lorenz Wellhausen,et al. Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.
[199] Larry Rudolph,et al. Implementation Matters in Deep RL: A Case Study on PPO and TRPO , 2020, ICLR.
[200] John Schulman,et al. Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.
[201] Wenbo Gao,et al. ES-ENAS: Combining Evolution Strategies with Neural Architecture Search at No Extra Cost for Reinforcement Learning , 2021, ArXiv.
[202] Wojciech Zaremba,et al. Asymmetric self-play for automatic goal discovery in robotic manipulation , 2021, ArXiv.
[203] Stuart J. Russell,et al. Quantifying Differences in Reward Functions , 2020, ICLR.
[204] Frank Hutter,et al. Sample-Efficient Automated Deep Reinforcement Learning , 2020, ICLR.
[205] Learning Synthetic Environments for Reinforcement Learning with Evolution Strategies , 2021, ArXiv.
[206] Michael A. Osborne,et al. Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces , 2021, ICML.
[207] Yevgen Chebotar,et al. Meta Learning via Learned Loss , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).
[208] Johan S. Obando-Ceron,et al. Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research , 2020, ICML.
[209] Michael A. Osborne,et al. Personalized Closed-Loop Brain Stimulation for Effective Neurointervention Across Participants , 2021, bioRxiv.
[210] Marius Lindauer,et al. Self-Paced Context Evaluation for Contextual Reinforcement Learning , 2021, ICML.
[211] Frank Hutter,et al. DEHB: Evolutionary Hyberband for Scalable, Robust and Efficient Hyperparameter Optimization , 2021, IJCAI.
[212] Michael A. Osborne,et al. Deep reinforcement learning for efficient measurement of quantum devices , 2020, npj Quantum Information.
[213] Yuval Tassa,et al. From Motor Control to Team Play in Simulated Humanoid Football , 2021, Sci. Robotics.
[214] Fabio Petroni,et al. MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research , 2021, NeurIPS Datasets and Benchmarks.
[215] Evolving Reinforcement Learning Algorithms , 2021, ICLR.
[216] Sergey Levine,et al. Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning , 2020, ICLR.
[217] Edward Grefenstette,et al. Prioritized Level Replay , 2020, ICML.
[218] Max Jaderberg,et al. Faster Improvement Rate Population Based Training , 2021, ArXiv.
[219] Amr Ahmed,et al. Amazon SageMaker Automatic Model Tuning: Scalable Gradient-Free Optimization , 2020, KDD.
[220] Bodo Rosenhahn,et al. CARL: A Benchmark for Contextual and Adaptive Reinforcement Learning , 2021, ArXiv.
[221] Gergely Neu,et al. Logistic $Q$-Learning , 2020, AISTATS.
[222] Xingyou Song,et al. RL-DARTS: Differentiable Architecture Search for Reinforcement Learning , 2021, ArXiv.
[223] Zeb Kurth-Nelson,et al. Alchemy: A structured task distribution for meta-reinforcement learning , 2021, ArXiv.
[224] Junhyuk Oh,et al. Discovery of Options via Meta-Learned Subgoals , 2021, NeurIPS.
[225] Michael A. Osborne,et al. Revisiting Design Choices in Offline Model Based Reinforcement Learning , 2021, ICLR.
[226] Silvio Savarese,et al. Adaptive Procedural Task Generation for Hard-Exploration Problems , 2020, ICLR.
[227] Trevor Darrell,et al. Regularization Matters in Policy Optimization -- An Empirical Study on Continuous Control. , 2020 .
[228] Pierre-Yves Oudeyer,et al. TeachMyAgent: a Benchmark for Automatic Curriculum Learning in Deep RL , 2021, ICML.
[229] Marius Lindauer,et al. TempoRL: Learning When to Act , 2021, ICML.
[230] Aldo Pacchiano,et al. Deep Reinforcement Learning with Dynamic Optimism , 2021, ArXiv.
[231] R. H. Sakr,et al. Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm , 2021, PloS one.
[232] Frank Hutter,et al. On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning , 2021, AISTATS.
[233] F. Formenti,et al. Simulation-based optimisation to quantify heterogeneity of specific ventilation and perfusion in the lung by the Inspired Sinewave Test , 2021, Scientific Reports.
[234] Stephen Roberts,et al. Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL , 2021, NeurIPS.
[235] M. Lindauer,et al. Hyperparameters in Contextual RL are Highly Situational , 2022, ArXiv.
[236] Joshua B. Tenenbaum,et al. Learning with AMIGo: Adversarially Motivated Intrinsic Goals , 2020, ICLR.
[237] Max Jaderberg,et al. Open-Ended Learning Leads to Generally Capable Agents , 2021, ArXiv.
[238] Shengyi Huang,et al. Griddly: A platform for AI research in games , 2021, Softw. Impacts.
[239] Matthieu Geist,et al. What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study , 2021, ICLR.
[240] Satinder Singh,et al. Bootstrapped Meta-Learning , 2021, ICLR.
[241] Yevgen Chebotar,et al. Visionary: Vision architecture discovery for robot learning , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).
[242] Edward Grefenstette,et al. Replay-Guided Adversarial Environment Design , 2021, NeurIPS.
[243] Natasha Jaques,et al. Environment Generation for Zero-Shot Compositional Reinforcement Learning , 2022, NeurIPS.