Interventional Few-Shot Learning

We uncover an ever-overlooked deficiency in the prevailing Few-Shot Learning (FSL) methods: the pre-trained knowledge is indeed a confounder that limits the performance. This finding is rooted from our causal assumption: a Structural Causal Model (SCM) for the causalities among the pre-trained knowledge, sample features, and labels. Thanks to it, we propose a novel FSL paradigm: Interventional Few-Shot Learning (IFSL). Specifically, we develop three effective IFSL algorithmic implementations based on the backdoor adjustment, which is essentially a causal intervention towards the SCM of many-shot learning: the upper-bound of FSL in a causal view. It is worth noting that the contribution of IFSL is orthogonal to existing fine-tuning and meta-learning based FSL methods, hence IFSL can improve all of them, achieving a new 1-/5-shot state-of-the-art on \textit{mini}ImageNet, \textit{tiered}ImageNet, and cross-domain CUB. Code is released at this https URL.

[1]  Martial Hebert,et al.  Image Deformation Meta-Networks for One-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  Hanwang Zhang,et al.  Two Causal Principles for Improving Visual Dialog , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Joshua B. Tenenbaum,et al.  Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[5]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[8]  Neil D. Lawrence,et al.  Empirical Bayes Transductive Meta-Learning with Synthetic Gradients , 2020, ICLR.

[9]  Bernhard Schölkopf,et al.  Counterfactuals uncover the modular structure of deep generative models , 2018, ICLR.

[10]  Quanming Yao,et al.  Few-shot Learning: A Survey , 2019, ArXiv.

[11]  Jing Zhang,et al.  Few-Shot Learning via Saliency-Guided Hallucination of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Bernhard Schölkopf,et al.  Discovering Causal Signals in Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Mubarak Shah,et al.  Task Agnostic Meta-Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[16]  Tiago Ramalho,et al.  An empirical study of pretrained representations for few-shot classification , 2019, ArXiv.

[17]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[19]  Brian D. Davison,et al.  Impact of ImageNet Model Selection on Domain Adaptation , 2020, 2020 IEEE Winter Applications of Computer Vision Workshops (WACVW).

[20]  Stefano Soatto,et al.  A Baseline for Few-Shot Image Classification , 2019, ICLR.

[21]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[22]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[23]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[24]  J. Schulman,et al.  Reptile: a Scalable Metalearning Algorithm , 2018 .

[25]  Hanwang Zhang,et al.  Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect , 2020, NeurIPS.

[26]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[27]  Jay L. Devore,et al.  A Modern Introduction to Probability and Statistics: Understanding Why and How , 2006 .

[28]  V. Kshirsagar,et al.  Face recognition using Eigenfaces , 2011, 2011 3rd International Conference on Computer Research and Development.

[29]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[31]  Amos J. Storkey,et al.  Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[32]  Jianqiang Huang,et al.  Unbiased Scene Graph Generation From Biased Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  J. Pearl,et al.  Causal Inference in Statistics: A Primer , 2016 .

[34]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[35]  Xiaogang Wang,et al.  Finding Task-Relevant Features for Few-Shot Learning by Category Traversal , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Xilin Chen,et al.  Cross Attention Network for Few-shot Classification , 2019, NeurIPS.

[37]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[38]  Bernt Schiele,et al.  Meta-Transfer Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Atsuto Maki,et al.  Factors of Transferability for a Generic ConvNet Representation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  J. Pearl,et al.  Bounds on Treatment Effects from Studies with Imperfect Compliance , 1997 .

[41]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[42]  Yoshua Bengio,et al.  MetaGAN: An Adversarial Approach to Few-Shot Learning , 2018, NeurIPS.

[43]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[44]  Feiyue Huang,et al.  LGM-Net: Learning to Generate Matching Networks for Few-Shot Learning , 2019, ICML.

[45]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[46]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[47]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[48]  Hanwang Zhang,et al.  Visual Commonsense R-CNN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Hanwang Zhang,et al.  Deconfounded Image Captioning: A Causal Retrospect , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Albert Gatt,et al.  Transfer learning from language models to image caption generators: Better models may not transfer better , 2019, ArXiv.

[51]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[52]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Aoxue Li,et al.  Boosting Few-Shot Learning With Adaptive Margin Loss , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Pierre Baldi,et al.  The dropout learning algorithm , 2014, Artif. Intell..

[56]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[57]  Guosheng Lin,et al.  DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover’s Distance and Structured Classifiers , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[59]  Pietro Perona,et al.  Visual Causal Feature Learning , 2014, UAI.

[60]  Jinhui Tang,et al.  Causal Intervention for Weakly-Supervised Semantic Segmentation , 2020, NeurIPS.

[61]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[62]  J. Pearl Causal diagrams for empirical research , 1995 .

[63]  Bernhard Schölkopf,et al.  Domain Adaptation with Conditional Transferable Components , 2016, ICML.

[64]  Stefan Bauer,et al.  Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness , 2018, ICML.

[65]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[66]  Wei Shen,et al.  Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[67]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[68]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[69]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[70]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[71]  Yan Wang,et al.  SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning , 2019, ArXiv.

[72]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[73]  Bernt Schiele,et al.  An Ensemble of Epoch-Wise Empirical Bayes for Few-Shot Learning , 2019, ECCV.

[74]  Sergey Levine,et al.  Causal Confusion in Imitation Learning , 2019, NeurIPS.

[75]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76]  J. Angrist,et al.  Journal of Economic Perspectives—Volume 15, Number 4—Fall 2001—Pages 69–85 Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments , 2022 .

[77]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[78]  Nikos Komodakis,et al.  Generating Classification Weights With GNN Denoising Autoencoders for Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Fei Sha,et al.  Learning Embedding Adaptation for Few-Shot Learning , 2018, ArXiv.

[80]  Sanjeev Arora,et al.  Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.

[81]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .