Auxiliary Learning for Deep Multi-task Learning

Multi-task learning (MTL) is an efficient solution to solve multiple tasks simultaneously in order to get better speed and performance than handling each single-task in turn. The most current methods can be categorized as either: (i) hard parameter sharing where a subset of the parameters is shared among tasks while other parameters are task-specific; or (ii) soft parameter sharing where all parameters are task-specific but they are jointly regularized. Both methods suffer from limitations: the shared hidden layers of the former are difficult to optimize due to the competing objectives while the complexity of the latter grows linearly with the increasing number of tasks. To mitigate those drawbacks, this paper proposes an alternative, where we explicitly construct an auxiliary module to mimic the soft parameter sharing for assisting the optimization of the hard parameter sharing layers in the training phase. In particular, the auxiliary module takes the outputs of the shared hidden layers as inputs and is supervised by the auxiliary task loss. During training, the auxiliary module is jointly optimized with the MTL network, serving as a regularization by introducing an inductive bias to the shared layers. In the testing phase, only the original MTL network is kept. Thus our method avoids the limitation of both categories. We evaluate the proposed auxiliary module on pixel-wise prediction tasks, including semantic segmentation, depth estimation, and surface normal prediction with different network structures. The extensive experiments over various settings verify the effectiveness of our methods.

[1]  Yuanzhi Li,et al.  A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[2]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Sanjeev Arora,et al.  On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.

[5]  Ke Chen,et al.  Structured Knowledge Distillation for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Wei Wu,et al.  Practical Block-Wise Neural Network Architecture Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[8]  Matthieu Cord,et al.  Revisiting Multi-Task Learning with ROCK: a Deep Residual Auxiliary Block for Visual Detection , 2018, NeurIPS.

[9]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[11]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Jingdong Wang,et al.  OCNet: Object Context Network for Scene Parsing , 2018, ArXiv.

[13]  Bruce W. Suter,et al.  Extragradient Method in Optimization: Convergence and Complexity , 2016, J. Optim. Theory Appl..

[14]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Li Fei-Fei,et al.  Dynamic Task Prioritization for Multitask Learning , 2018, ECCV.

[16]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Xiang Li,et al.  Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation , 2018, ECCV.

[19]  Trevor Cohn,et al.  Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser , 2015, ACL.

[20]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[21]  Iasonas Kokkinos,et al.  UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Philip S. Yu,et al.  Learning Multiple Tasks with Multilinear Relationship Networks , 2015, NIPS.

[23]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[25]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Nicu Sebe,et al.  PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Zhiguo Cao,et al.  Deep attention-based classification network for robust depth prediction , 2018, ACCV.

[28]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Oriol Vinyals,et al.  Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[30]  Thomas Wolf,et al.  A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks , 2018, AAAI.

[31]  Hao Chen,et al.  Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[33]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Roberto Cipolla,et al.  Understanding RealWorld Indoor Scenes with Synthetic Data , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Ian D. Reid,et al.  Light-Weight RefineNet for Real-Time Semantic Segmentation , 2018, BMVC.

[37]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[39]  Jana Kosecka,et al.  Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[40]  Zhuowen Tu,et al.  Training Deeper Convolutional Networks with Deep Supervision , 2015, ArXiv.

[41]  Yongxin Yang,et al.  Deep Multi-task Representation Learning: A Tensor Factorisation Approach , 2016, ICLR.

[42]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[43]  Yuanzhi Li,et al.  Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.

[44]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Shuicheng Yan,et al.  Mutual Learning to Adapt for Joint Human Parsing and Pose Estimation , 2018, ECCV.

[47]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[48]  Jing Liu,et al.  Discrimination-aware Channel Pruning for Deep Neural Networks , 2018, NeurIPS.

[49]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[50]  Jingdong Wang,et al.  Interleaved Group Convolutions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[52]  Yong Yu,et al.  Efficient Architecture Search by Network Transformation , 2017, AAAI.

[53]  Chunhua Shen,et al.  Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[54]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[56]  Ian D. Reid,et al.  Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[57]  Yongxin Yang,et al.  Trace Norm Regularised Deep Multi-Task Learning , 2016, ICLR.

[58]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Vladlen Koltun,et al.  Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.