暂无分享,去创建一个
[1] Jürgen Schmidhuber,et al. A ‘Self-Referential’ Weight Matrix , 1993 .
[2] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[3] Jascha Sohl-Dickstein,et al. Learning Unsupervised Learning Rules , 2018, ArXiv.
[4] Pieter Abbeel,et al. Meta-Learning with Temporal Convolutions , 2017, ArXiv.
[5] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[6] Yevgen Chebotar,et al. Meta Learning via Learned Loss , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).
[7] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[8] Pieter Abbeel,et al. Evolved Policy Gradients , 2018, NeurIPS.
[9] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[10] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[11] Yoshua Bengio,et al. Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.
[12] Lee Spector,et al. Evolution of reward functions for reinforcement learning , 2011, GECCO.
[13] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[14] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.
[15] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[16] Juergen Schmidhuber,et al. On learning how to learn learning strategies , 1994 .
[17] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[18] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.
[19] Sergey Levine,et al. One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.
[20] Michael I. Jordan,et al. RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.
[21] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[22] R. J. Williams,et al. On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.
[23] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[24] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.
[25] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[26] Marcin Andrychowicz,et al. Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.
[27] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[28] Jitendra Malik,et al. Learning to Optimize Neural Nets , 2017, ArXiv.
[29] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[30] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[31] Li Zhang,et al. Learning to Learn: Meta-Critic Networks for Sample Efficient Learning , 2017, ArXiv.
[32] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[33] Jürgen Schmidhuber,et al. Learning to forget: continual prediction with LSTM , 1999 .
[34] Marco Wiering,et al. HQ-Learning: Discovering Markovian Subgoals for Non-Markovian Reinforcement Learning , 1996 .
[35] Jürgen Schmidhuber,et al. Learning to generate sub-goals for action sequences , 1991 .
[36] John Schulman,et al. Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.
[37] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[38] Wulfram Gerstner,et al. Reduction of the Hodgkin-Huxley Equations to a Single-Variable Threshold Model , 1997, Neural Computation.
[39] Stewart W. Wilson,et al. A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .
[40] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[41] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.
[42] Jeff Clune,et al. AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence , 2019, ArXiv.
[43] Sergey Levine,et al. Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.
[44] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[45] Ion Stoica,et al. Ray RLLib: A Composable and Scalable Reinforcement Learning Library , 2017, NIPS 2017.
[46] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[47] Jitendra Malik,et al. Learning to Optimize , 2016, ICLR.
[48] Daan Wierstra,et al. One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.
[49] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.
[50] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[51] Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.
[52] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.
[53] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[54] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[55] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[56] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[57] Razvan Pascanu,et al. Meta-Learning with Latent Embedding Optimization , 2018, ICLR.
[58] Jieyu Zhao,et al. Direct Policy Search and Uncertain Policy Evaluation , 1998 .
[59] Leslie Pack Kaelbling,et al. Meta-learning curiosity algorithms , 2020, ICLR.
[60] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[61] Yoshua Bengio,et al. Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.