论文信息 - Learning to Learn: Meta-Critic Networks for Sample Efficient Learning

Learning to Learn: Meta-Critic Networks for Sample Efficient Learning

We propose a novel and flexible approach to meta-learning for learning-to-learn from only a few examples. Our framework is motivated by actor-critic reinforcement learning, but can be applied to both reinforcement and supervised learning. The key idea is to learn a meta-critic: an action-value function neural network that learns to criticise any actor trying to solve any specified task. For supervised learning, this corresponds to the novel idea of a trainable task-parametrised loss generator. This meta-critic approach provides a route to knowledge transfer that can flexibly deal with few-shot and semi-supervised conditions for both reinforcement and supervised learning. Promising results are shown on both reinforcement and supervised learning problems.

[1] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[4] Jürgen Schmidhuber,et al. Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[5] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[6] Robert Babuska,et al. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[7] Eric Eaton,et al. Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[8] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[9] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[10] Jitendra Malik,et al. Learning to Optimize , 2016, ICLR.

[11] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[12] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[13] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[14] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[15] Yoshua Bengio,et al. Zero-data Learning of New Tasks , 2008, AAAI.

[16] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[17] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[18] Sergey Levine,et al. Generalizing Skills with Semi-Supervised Reinforcement Learning , 2016, ICLR.

[19] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[20] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[21] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[22] Bruno Castro da Silva,et al. Learning Parameterized Skills , 2012, ICML.

[23] Bartunov Sergey,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[24] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[25] Luca Bertinetto,et al. Learning feed-forward one-shot learners , 2016, NIPS.

[26] Quoc V. Le,et al. HyperNetworks , 2016, ICLR.

[27] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[28] Rich Caruana,et al. Model compression , 2006, KDD '06.

[29] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[30] Jan Peters,et al. Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.

[31] Richard J. Mammone,et al. Meta-neural networks that learn by learning , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[32] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[33] Olivier Sigaud,et al. Tensor Based Knowledge Transfer Across Skill Categories for Robot Control , 2017, IJCAI.

[34] Gregory R. Koch,et al. Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[35] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[36] Peter Dayan,et al. Q-learning , 1992, Machine Learning.