Exploring Relational Context for Multi-Task Dense Prediction

The timeline of computer vision research is marked with advances in learning and utilizing efficient contextual representations. Most of them, however, are targeted at improving model performance on a single downstream task. We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads. Our goal is to find the most efficient way to refine each task prediction by capturing cross-task contexts dependent on tasks’ relations. We explore various attention-based contexts, such as global and local, in the multi-task setting and analyze their behavior when applied to refine each task independently. Empirical findings confirm that different source-target task pairs benefit from different context types. To automate the selection process, we propose an Adaptive Task-Relational Context (ATRC) module, which samples the pool of all available contexts for each task pair using neural architecture search and outputs the optimal configuration for deployment. Our method achieves state-of-the-art performance on two important multi-task benchmarks, namely NYUD-v2 and PASCAL-Context. The proposed ATRC has a low computational toll and can be used as a drop-in refinement module for any supervised multi-task architecture.

[1]  Yu Cheng,et al.  Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[4]  L. Gool,et al.  Automated Search for Resource-Efficient Branched Multi-Task Networks , 2020, BMVC.

[5]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[6]  Luc Van Gool,et al.  Branched Multi-Task Networks: Deciding what layers to share , 2019, BMVC.

[7]  Quoc V. Le,et al.  Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Zhibin Hong,et al.  ACFNet: Attentional Class Feature Network for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[12]  Sabine Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[15]  Wei Liu,et al.  MTL-NAS: Task-Agnostic Neural Architecture Search Towards General-Purpose Multi-Task Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yang Zhao,et al.  Deep High-Resolution Representation Learning for Visual Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[18]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[19]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[20]  Luc Van Gool,et al.  MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning , 2020, ECCV.

[21]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[22]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[23]  Andrew J. Davison,et al.  End-To-End Multi-Task Learning With Attention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[25]  Han Zhang,et al.  Co-Occurrent Features in Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Nicu Sebe,et al.  PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Nikolaos Pappas,et al.  Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.

[28]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[29]  Iasonas Kokkinos,et al.  Attentive Single-Tasking of Multiple Tasks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  R. Light Measures of response agreement for qualitative data: Some generalizations and alternatives. , 1971 .

[31]  Iasonas Kokkinos,et al.  Pushing the Boundaries of Boundary Detection using Deep Learning , 2015, ICLR 2016.

[32]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Bo Li,et al.  Monocular Depth Estimation with Hierarchical Fusion of Dilated CNNs and Soft-Weighted-Sum Inference , 2017, Pattern Recognit..

[34]  Wei Liu,et al.  NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Chaoqun Wang,et al.  Pattern-Structure Diffusion for Multi-Task Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[37]  Iasonas Kokkinos,et al.  UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Zheng Zhang,et al.  Disentangled Non-Local Neural Networks , 2020, ECCV.

[39]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[40]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Xilin Chen,et al.  Object-Contextual Representations for Semantic Segmentation , 2019, ECCV.

[43]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[44]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Shuicheng Yan,et al.  A2-Nets: Double Attention Networks , 2018, NeurIPS.

[46]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[47]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[48]  Marc Pollefeys,et al.  Discriminatively Trained Dense Surface Normal Estimation , 2014, ECCV.

[49]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[50]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Sanja Fidler,et al.  Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Ashish Vaswani,et al.  Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[54]  Hang Su,et al.  Pixel-Adaptive Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[56]  DSNAS: Direct Neural Architecture Search Without Parameter Retraining , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Nicu Sebe,et al.  Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[59]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Luc Van Gool,et al.  Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference , 2020, ECCV.

[61]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[64]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.