CompositeTasking: Understanding Images by Spatial Composition of Tasks

We define the concept of CompositeTasking as the fusion of multiple, spatially distributed tasks, for various aspects of image understanding. Learning to perform spatially distributed tasks is motivated by the frequent availability of only sparse labels across tasks, and the desire for a compact multi-tasking network. To facilitate CompositeTasking, we introduce a novel task conditioning model -- a single encoder-decoder network that performs multiple, spatially varying tasks at once. The proposed network takes a pair of an image and a set of pixel-wise dense tasks as inputs, and makes the task related predictions for each pixel, which includes the decision of applying which task where. As to the latter, we learn the composition of tasks that needs to be performed according to some CompositeTasking rules. It not only offers us a compact network for multi-tasking, but also allows for task-editing. The strength of the proposed method is demonstrated by only having to supply sparse supervision per task. The obtained results are on par with our baselines that use dense supervision and a multi-headed multi-tasking design. The source code will be made publicly available at this http URL .

[1]  Luc Van Gool,et al.  SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects , 2020, ECCV.

[2]  Subhransu Maji,et al.  Task2Vec: Task Embedding for Meta-Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Iasonas Kokkinos,et al.  Attentive Single-Tasking of Multiple Tasks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[5]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Dinesh Manocha,et al.  EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege’s Principle , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Chao Dong,et al.  Recovering Realistic Texture in Image Super-Resolution by Deep Spatial Feature Transform , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Luc Van Gool,et al.  Branched Multi-Task Networks: Deciding what layers to share , 2019, BMVC.

[10]  Nicu Sebe,et al.  PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Chitta Baral,et al.  Integrating Knowledge and Reasoning in Image Understanding , 2019, IJCAI.

[12]  Santiago Manen,et al.  PathTrack: Fast Trajectory Annotation with Path Supervision , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Sanja Fidler,et al.  Visual Reasoning by Progressive Module Networks , 2018, ICLR.

[14]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[15]  Xiang Li,et al.  Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation , 2018, ECCV.

[16]  Yu Cheng,et al.  Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Qi Wu,et al.  Visual question answering: A survey of methods and datasets , 2016, Comput. Vis. Image Underst..

[20]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[21]  Greg Mori,et al.  Learning a Deep ConvNet for Multi-Label Classification With Partial Labels , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Peng Wang,et al.  Semantic Instance Segmentation via Deep Metric Learning , 2017, ArXiv.

[23]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[24]  Michael J. Black,et al.  FAUST: Dataset and Evaluation for 3D Mesh Registration , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Antonio Torralba,et al.  Parsing IKEA Objects: Fine Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Andrea Vedaldi,et al.  Universal representations: The missing link between faces, text, planktons, and cat breeds , 2017, ArXiv.

[27]  Weihong Deng,et al.  Learning temporal features using LSTM-CNN architecture for face anti-spoofing , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[28]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[30]  Zhiao Huang,et al.  Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[31]  Andrea Vedaldi,et al.  Learning multiple visual domains with residual adapters , 2017, NIPS.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[34]  Ana Cristina Murillo,et al.  Coral-Segmentation: Training Dense Labeling Models with Sparse Ground Truth , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[35]  Yunchao Wei,et al.  Proposal-Free Network for Instance-Level Object Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Yoav Artzi,et al.  TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ying Wu,et al.  A Modulation Module for Multi-task Learning with Applications in Image Retrieval , 2018, ECCV.

[38]  Luc Van Gool,et al.  MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning , 2020, ECCV.

[39]  Sjoerd van Steenkiste,et al.  Are Disentangled Representations Helpful for Abstract Visual Reasoning? , 2019, NeurIPS.

[40]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[41]  Senthil Yogamani,et al.  NeurAll: Towards a Unified Visual Perception Model for Automated Driving , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[42]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Luc Van Gool,et al.  Fast Scene Understanding for Autonomous Driving , 2017, ArXiv.

[44]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[45]  Luc Van Gool,et al.  Reparameterizing Convolutions for Incremental Multi-Task Learning without Task Interference , 2020, ECCV.

[46]  Xin Ye,et al.  From Seeing to Moving: A Survey on Learning for Visual Indoor Navigation (VIN) , 2020, ArXiv.

[47]  Anthony M. Zador,et al.  A critique of pure learning and what artificial neural networks can learn from animal brains , 2019, Nature Communications.

[48]  Dat T. Huynh,et al.  Interactive Multi-Label CNN Learning With Partial Labels , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Marcel Worring,et al.  Many Task Learning With Task Routing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  M. Pollefeys,et al.  DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene From Sparse LiDAR Data and Single Color Image , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Michael Crawshaw,et al.  Multi-Task Learning with Deep Neural Networks: A Survey , 2020, ArXiv.

[53]  Nicu Sebe,et al.  Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Luc Van Gool,et al.  Semantic Instance Segmentation with a Discriminative Loss Function , 2017, ArXiv.

[56]  Luc Van Gool,et al.  Semantic Instance Segmentation for Autonomous Driving , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[57]  Luc Van Gool,et al.  Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Andrew J. Davison,et al.  End-To-End Multi-Task Learning With Attention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Andrea Vedaldi,et al.  Efficient Parametrization of Multi-domain Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62]  M. Jorge Cardoso,et al.  Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[63]  Jordi Pont-Tuset,et al.  Measures and Meta-Measures for the Supervised Evaluation of Image Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.