Transferring Knowledge across Learning Processes

In complex transfer learning scenarios new tasks might not be tightly linked to previous tasks. Approaches that transfer information contained only in the final parameters of a source model will therefore struggle. Instead, transfer learning at a higher level of abstraction is needed. We propose Leap, a framework that achieves this by transferring knowledge across learning processes. We associate each task with a manifold on which the training process travels from initialization to final parameters and construct a meta-learning objective that minimizes the expected length of this path. Our framework leverages only information obtained during training and can be computed on the fly at negligible cost. We demonstrate that our framework outperforms competing methods, both in meta-learning and transfer learning, on a set of computer vision tasks. Finally, we demonstrate that Leap can transfer knowledge across learning processes in demanding reinforcement learning environments (Atari) that involve millions of gradient steps.

[1]  Lars Kai Hansen,et al.  Latent Space Oddity: on the Curvature of Deep Generative Models , 2017, ICLR.

[2]  Renjie Liao,et al.  Understanding Short-Horizon Bias in Stochastic Meta-Optimization , 2018, ICLR.

[3]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[4]  Kenneth O. Stanley,et al.  Differentiable plasticity: training plastic neural networks with backpropagation , 2018, ICML.

[5]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[8]  Neil D. Lawrence,et al.  Metrics for Probabilistic Geometries , 2014, UAI.

[9]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[10]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11]  Roger B. Grosse,et al.  A Coordinate-Free Construction of Scalable Natural Gradient , 2018, ArXiv.

[12]  Michael A. Osborne,et al.  AdaGeo: Adaptive Geometric Learning for Optimization and Sampling , 2018, AISTATS.

[13]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[14]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[15]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[16]  Xueyan Jiang,et al.  Metrics for Deep Generative Models , 2017, AISTATS.

[17]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[18]  J. L. Walsh,et al.  The theory of splines and their applications , 1969 .

[19]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[20]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[21]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[22]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[23]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[24]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[25]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[26]  Joshua B. Tenenbaum,et al.  One shot learning of simple visual concepts , 2011, CogSci.

[27]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[28]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[29]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[30]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[33]  Abhishek Kumar,et al.  Improved Semi-supervised Learning with GANs using Manifold Invariances , 2017, NIPS 2017.

[34]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.

[35]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[36]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[38]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[39]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Seungjin Choi,et al.  Meta-Learning with Adaptive Layerwise Metric and Subspace , 2018, ArXiv.

[41]  P. Thomas Fletcher,et al.  The Riemannian Geometry of Deep Generative Models , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[42]  Tom Eccles,et al.  Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies , 2018, NeurIPS.

[43]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[44]  Razvan Pascanu,et al.  Revisiting Natural Gradient for Deep Networks , 2013, ICLR.

[45]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[46]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[47]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[48]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .