MetaNODE: Prototype Optimization as a Neural ODE for Few-Shot Learning

Few-Shot Learning (FSL) is a challenging task, i.e., how to recognize novel classes with few examples? Pre-training based methods effectively tackle the problem by pre-training a feature extractor and then predicting novel classes via a cosine nearest neighbor classifier with mean-based prototypes. Nevertheless, due to the data scarcity, the mean-based prototypes are usually biased. In this paper, we attempt to diminish the prototype bias by regarding it as a prototype optimization problem. To this end, we propose a novel meta-learning based prototype optimization framework to rectify prototypes, i.e., introducing a meta-optimizer to optimize prototypes. Although the existing meta-optimizers can also be adapted to our framework, they all overlook a crucial gradient bias issue, i.e., the mean-based gradient estimation is also biased on sparse data. To address the issue, we regard the gradient and its flow as meta-knowledge and then propose a novel Neural Ordinary Differential Equation (ODE)-based meta-optimizer to polish prototypes, called MetaNODE. In this meta-optimizer, we first view the mean-based prototypes as initial prototypes, and then model the process of prototype optimization as continuous-time dynamics specified by a Neural ODE. A gradient flow inference network is carefully designed to learn to estimate the continuous gradient flow for prototype dynamics. Finally, the optimal prototypes can be obtained by solving the Neural ODE. Extensive experiments on miniImagenet, tieredImagenet, and CUB-200-2011 show the effectiveness of our method.

[1]  Massimo Caccia,et al.  Learning where to learn: Gradient sparsity in meta and continual learning , 2021, NeurIPS.

[2]  Changsheng Xu,et al.  ECKPN: Explicit Class Knowledge Propagation Network for Transductive Few-shot Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Dapeng Chen,et al.  Mutual CRF-GNN for Few-shot Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Zhiqiang Shen,et al.  Partial Is Better Than All: Revisiting Fine-tuning Strategy for Few-shot Learning , 2021, AAAI.

[5]  Min Xu,et al.  Free Lunch for Few-shot Learning: Distribution Calibration , 2021, ICLR.

[6]  Volker Tresp,et al.  Temporal Knowledge Graph Forecasting with Neural ODE , 2021, ArXiv.

[7]  Shu Wu,et al.  Cold-start Sequential Recommendation via Meta Learner , 2020, AAAI.

[8]  Kyoung Mu Lee,et al.  Meta-Learning with Adaptive Hyperparameters , 2020, NeurIPS.

[9]  Kan Chen,et al.  A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks , 2020, AISTATS.

[10]  Zhibo Chen,et al.  Uncertainty-Aware Few-Shot Image Classification , 2020, IJCAI.

[11]  Yunming Ye,et al.  Prototype Completion with Primitive Knowledge for Few-Shot Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Pablo Piantanida,et al.  Transductive Information Maximization For Few-Shot Learning , 2020, ArXiv.

[13]  Ankush Gupta,et al.  CrossTransformers: spatially-aware few-shot transfer , 2020, NeurIPS.

[14]  Ismail Ben Ayed,et al.  Laplacian Regularized Few-Shot Learning , 2020, ICML.

[15]  Eric Z. Chen,et al.  MRI Image Reconstruction via Learning Optimization Using Neural ODEs , 2020, MICCAI.

[16]  Gui-Song Xia,et al.  Implicit Euler ODE Networks for Single-Image Dehazing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[17]  Neil D. Lawrence,et al.  Empirical Bayes Transductive Meta-Learning with Synthetic Gradients , 2020, ICLR.

[18]  Wei Wang,et al.  One-Shot Image Classification by Learning to Restore Prototypes , 2020, AAAI.

[19]  Ling Yang,et al.  DPGN: Distribution Propagation Graph Network for Few-Shot Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yanwei Fu,et al.  Instance Credibility Inference for Few-Shot Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Zheng Zhang,et al.  Negative Margin Matters: Understanding Margin in Few-shot Classification , 2020, ECCV.

[22]  Yue Wang,et al.  Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need? , 2020, ECCV.

[23]  Guosheng Lin,et al.  DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover’s Distance and Structured Classifiers , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Trevor Darrell,et al.  A New Meta-Baseline for Few-Shot Learning , 2020, ArXiv.

[25]  Alexandre Drouin,et al.  Embedding Propagation: Smoother Manifold for Few-Shot Classification , 2020, ECCV.

[26]  Christian Gagn'e,et al.  Associative Alignment for Few-shot Image Classification , 2019, ECCV.

[27]  Jinlu Liu,et al.  Prototype Rectification for Few-Shot Learning , 2019, ECCV.

[28]  Xavier Amatriain,et al.  Few-Shot Learning for Dermatological Disease Diagnosis , 2019, MLHC.

[29]  Xilin Chen,et al.  Cross Attention Network for Few-shot Classification , 2019, NeurIPS.

[30]  Dacheng Tao,et al.  Collect and Select: Semantic Alignment Metric Learning for Few-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Yonghong Tian,et al.  Transductive Episodic-Wise Adaptive Metric for Few-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Bingbing Ni,et al.  Variational Few-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Oriol Vinyals,et al.  Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML , 2019, ICLR.

[34]  Sergey Levine,et al.  Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[35]  Stefano Soatto,et al.  A Baseline for Few-Shot Image Classification , 2019, ICLR.

[36]  Andrei A. Rusu,et al.  Meta-Learning with Warped Gradient Descent , 2019, ICLR.

[37]  Tao Xiang,et al.  Few-Shot Learning With Global Class Representations , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Xiaogang Wang,et al.  Finding Task-Relevant Features for Few-Shot Learning by Category Traversal , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[40]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[41]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[42]  Eunho Yang,et al.  Learning to Propagate Labels: Transductive Propagation Network for Few-Shot Learning , 2018, ICLR.

[43]  Mubarak Shah,et al.  Task Agnostic Meta-Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[45]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[46]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[47]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[48]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[49]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[51]  Ernst Hairer,et al.  Solving Ordinary Differential Equations I: Nonstiff Problems , 2009 .

[52]  Roger Alexander,et al.  Solving Ordinary Differential Equations I: Nonstiff Problems (E. Hairer, S. P. Norsett, and G. Wanner) , 1990, SIAM Rev..

[53]  Shaogang Gong,et al.  Regularising Knowledge Transfer by Meta Functional Learning , 2021, IJCAI.

[54]  Michael Kampffmeyer,et al.  SEN: A Novel Feature Normalization Dissimilarity Measure for Prototypical Few-Shot Learning Networks , 2020, ECCV.

[55]  David Duvenaud,et al.  Latent Ordinary Differential Equations for Irregularly-Sampled Time Series , 2019, NeurIPS.