论文信息 - End-To-End Multi-Task Learning With Attention

End-To-End Multi-Task Learning With Attention

We propose a novel multi-task learning architecture, which allows learning of task-specific feature-level attention. Our design, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with a soft-attention module for each task. These modules allow for learning of task-specific features from the global features, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be trained end-to-end and can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. We evaluate our approach on a variety of datasets, across both image-to-image predictions and image classification tasks. We show that our architecture is state-of-the-art in multi-task learning compared to existing methods, and is also less sensitive to various weighting schemes in the multi-task loss function. Code is available at https://github.com/lorenmt/mtan.

[1] Andrea Vedaldi,et al. Efficient Parametrization of Multi-domain Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[3] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Andrew Zisserman,et al. Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5] Xiaogang Wang,et al. Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[7] Rich Caruana,et al. Multitask Learning , 1997, Machine-mediated learning.

[8] Razvan Pascanu,et al. Progressive Neural Networks , 2016, ArXiv.

[9] Heiko Hirschmüller,et al. Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[10] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[11] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[12] Andrea Vedaldi,et al. Learning multiple visual domains with residual adapters , 2017, NIPS.

[13] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[15] Hal Daumé,et al. Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[16] Iasonas Kokkinos,et al. UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Yann LeCun,et al. Indoor Semantic Segmentation using depth information , 2013, ICLR.

[18] Sebastian Thrun,et al. Learning to Learn , 1998, Springer US.

[19] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[21] Massimiliano Pontil,et al. Regularized multi--task learning , 2004, KDD.

[22] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[24] John K. Tsotsos,et al. Incremental Learning Through Deep Adaptation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Roberto Cipolla,et al. Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[28] Svetlana Lazebnik,et al. Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights , 2018, ECCV.

[29] Zhao Chen,et al. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[30] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32] Philip S. Yu,et al. Transfer Feature Learning with Joint Distribution Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[33] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[34] Li Fei-Fei,et al. Dynamic Task Prioritization for Multitask Learning , 2018, ECCV.

[35] Martial Hebert,et al. Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Rob Fergus,et al. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[37] Jitendra Malik,et al. R-CNNs for Pose Estimation and Action Detection , 2014, ArXiv.