Adapting Auxiliary Losses Using Gradient Similarity

One approach to deal with the statistical inefficiency of neural networks is to rely on auxiliary losses that help to build useful representations. However, it is not always trivial to know if an auxiliary task will be helpful for the main task and when it could start hurting. We propose to use the cosine similarity between gradients of tasks as an adaptive weight to detect when an auxiliary loss is helpful to the main loss. We show that our approach is guaranteed to converge to critical points of the main task and demonstrate the practical usefulness of the proposed algorithm in a few domains: multi-task supervised learning on subsets of ImageNet, reinforcement learning on gridworld, and reinforcement learning on Atari games.

[1]  A. Goldstein Cauchy's method of minimization , 1962 .

[2]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[3]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[4]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[5]  J.L. Carroll,et al.  Task similarity measures for transfer in reinforcement learning task libraries , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[6]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[7]  Alexander J. Smola,et al.  Multitask Learning without Label Correspondences , 2010, NIPS.

[8]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[11]  Eric Eaton,et al.  An automated measure of MDP similarity for transfer in reinforcement learning , 2014, AAAI 2014.

[12]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[17]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  John W. Fisher,et al.  Dreaming More Data: Class-dependent Distributions over Diffeomorphisms for Learned Data Augmentation , 2015, AISTATS.

[20]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[21]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[22]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[23]  Kibok Lee,et al.  Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale Image Classification , 2016, ICML.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[26]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[27]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[28]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[29]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[30]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[31]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[32]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[33]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Nash Equilibrium , 2017, ArXiv.

[34]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[35]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[36]  Pericles A. Mitkas,et al.  Deep Reinforcement Learning for Doom using Unsupervised Auxiliary Tasks , 2018, ArXiv.

[37]  Andrew Zisserman,et al.  Kickstarting Deep Reinforcement Learning , 2018, ArXiv.

[38]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.