Unlocking the Full Potential of Small Data with Diverse Supervision

Virtually all of deep learning literature relies on the assumption of large amounts of available training data. Indeed, even the majority of few-shot learning methods rely on a large set of "base classes" for pre-training. This assumption, however, does not always hold. For some tasks, annotating a large number of classes can be infeasible, and even collecting the images themselves can be a challenge in some scenarios. In this paper, we study this problem and call it "Small Data" setting, in contrast to "Big Data." To unlock the full potential of small data, we propose to augment the models with annotations for other related tasks, thus increasing their generalization abilities. In particular, we use the richly annotated scene parsing dataset ADE20K to construct our realistic Long-tail Recognition with Diverse Supervision (LRDS) benchmark, by splitting the object categories into head and tail based on their distribution. Following the standard few-shot learning protocol, we use the head classes for representation learning and the tail classes for evaluation. Moreover, we further subsample the head categories and images to generate two novel settings which we call "Scarce-Class" and "Scarce-Image," respectively corresponding to the shortage of training classes and images. Finally, we analyze the effect of applying various additional supervision sources under the proposed settings. Our experiments demonstrate that densely labeling a small set of images can indeed largely remedy the small data constraints. Our code and benchmark are available at https://github.com/BinahHu/ADE-FewShot.

[1]  Raja Giryes,et al.  Baby steps towards few-shot learning with multiple semantics , 2019, Pattern Recognit. Lett..

[2]  Siteng Huang,et al.  Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition , 2020, AAAI.

[3]  D. Rueckert,et al.  Self-Supervision with Superpixels: Training Few-shot Medical Image Segmentation without Annotation , 2020, ECCV.

[4]  Yue Wang,et al.  Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need? , 2020, ECCV.

[5]  Xiaojian He,et al.  Few-shot Learning with Weakly-supervised Object Localization , 2020, ArXiv.

[6]  Ming-Hsuan Yang,et al.  Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation , 2020, ICLR.

[7]  Luc Van Gool,et al.  MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning , 2020, ECCV.

[8]  Kate Saenko,et al.  A Broader Study of Cross-Domain Few-Shot Learning , 2019, ECCV.

[9]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Noah D. Goodman,et al.  Shaping Visual Representations with Language for Few-Shot Classification , 2019, ACL.

[11]  Subhransu Maji,et al.  When Does Self-supervision Improve Few-shot Learning? , 2019, ECCV.

[12]  Jitendra Malik,et al.  Which Tasks Should Be Learned Together in Multi-task Learning? , 2019, ICML.

[13]  Hugo Larochelle,et al.  Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples , 2019, ICLR.

[14]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[15]  M. Jorge Cardoso,et al.  Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Patrick Pérez,et al.  Boosting Few-Shot Visual Learning With Self-Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Zhiwu Lu,et al.  Large-Scale Few-Shot Learning: Knowledge Transfer With Class Hierarchy , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ross B. Girshick,et al.  LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yang Liu,et al.  Lightweight Privacy-Preserving Ensemble Classification for Face Recognition , 2019, IEEE Internet of Things Journal.

[20]  Abhinav Gupta,et al.  Scaling and Benchmarking Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Iasonas Kokkinos,et al.  Attentive Single-Tasking of Multiple Tasks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Bharath Hariharan,et al.  Few-Shot Learning With Localization in Realistic Settings , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[25]  Subhransu Maji,et al.  Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Marcel Worring,et al.  Many Task Learning With Task Routing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Cordelia Schmid,et al.  Diversity With Cooperation: Ensemble Methods for Few-Shot Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Frédo Durand,et al.  Data augmentation using learned transforms for one-shot medical image segmentation , 2019, ArXiv.

[29]  Pedro H. O. Pinheiro,et al.  Adaptive Cross-Modal Few-Shot Learning , 2019, NeurIPS.

[30]  Martial Hebert,et al.  Learning Compositional Representations for Few-Shot Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[32]  Vladlen Koltun,et al.  Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[33]  Alexei A. Efros,et al.  Improving Generalization via Scalable Neighborhood Component Analysis , 2018, ECCV.

[34]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[35]  Nicu Sebe,et al.  PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Andrea Vedaldi,et al.  Efficient Parametrization of Multi-domain Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[39]  Joshua B. Tenenbaum,et al.  Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[40]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[42]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[44]  Wei Shen,et al.  Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Cordelia Schmid,et al.  AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Bolei Zhou,et al.  Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[48]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Andrea Vedaldi,et al.  Learning multiple visual domains with residual adapters , 2017, NIPS.

[50]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[51]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[52]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[53]  Iasonas Kokkinos,et al.  UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[55]  Bharath Hariharan,et al.  Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[56]  Martial Hebert,et al.  Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs , 2016, NIPS.

[57]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[58]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[60]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[62]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[64]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[65]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[67]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[68]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[69]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[70]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[71]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[72]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.