ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization

Recent advances on large-scale pre-training have shown great potentials of leveraging a large set of Pre-Trained Models (PTMs) for improving Out-of-Distribution (OoD) generalization, for which the goal is to perform well on possible unseen domains after fine-tuning on multiple training domains. However, maximally exploiting a zoo of PTMs is challenging since fine-tuning all possible combinations of PTMs is computationally prohibitive while accurate selection of PTMs requires tackling the possible data distribution shift for OoD tasks. In this work, we propose ZooD, a paradigm for PTMs ranking and ensemble with feature selection. Our proposed metric ranks PTMs by quantifying inter-class discriminability and inter-domain stability of the features extracted by the PTMs in a leave-one-domain-out cross-validation manner. The top-K ranked models are then aggregated for the target OoD task. To avoid accumulating noise induced by model ensemble, we propose an efficient variational EM algorithm to select informative features. We evaluate our paradigm on a diverse model zoo consisting of 35 models for various OoD tasks and demonstrate: (i) model ranking is better correlated with fine-tuning ranking than previous methods and up to 9859x faster than brute-force fine-tuning; (ii) OoD generalization after model ensemble with feature selection outperforms the state-of-the-art methods and the accuracy on most challenging task DomainNet is improved from 46.5\% to 50.6\%. Furthermore, we provide the fine-tuning results of 35 PTMs on 7 OoD datasets, hoping to help the research of model zoo and OoD generalization. Code will be available at https://gitee.com/mindspore/models/tree/master/research/cv/zood.

[1]  Michael I. Jordan,et al.  Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs , 2021, J. Mach. Learn. Res..

[2]  Galin L. Jones,et al.  Markov Chain Monte Carlo in Practice. , 2021, Annual review of public health.

[3]  Zhenguo Li,et al.  NAS-OoD: Neural Architecture Search for Out-of-Distribution Generalization , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Zhangjie Cao,et al.  Zoo-Tuning: Adaptive Transfer from a Zoo of Models , 2021, ICML.

[5]  Li Dong,et al.  BEiT: BERT Pre-Training of Image Transformers , 2021, ICLR.

[6]  Zhenguo Li,et al.  Towards a Theoretical Framework of Out-of-Distribution Generalization , 2021, NeurIPS.

[7]  Zhenguo Li,et al.  OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Qun Liu,et al.  Improved OOD Generalization via Adversarial Training and Pre-training , 2021, ICML.

[9]  Y. Qiao,et al.  Domain Generalization with MixStyle , 2021, ICLR.

[10]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[11]  Mingsheng Long,et al.  LogME: Practical Assessment of Pre-trained Models for Transfer Learning , 2021, ICML.

[12]  Sungrae Park,et al.  SWAD: Domain Generalization by Seeking Flat Minima , 2021, NeurIPS.

[13]  Zhenguo Li,et al.  DecAug: Out-of-Distribution Generalization via Decomposed Feature Representation and Semantic Augmentation , 2020, AAAI.

[14]  Pang Wei Koh,et al.  WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2020, ICML.

[15]  Timothy M. Hospedales,et al.  How Well Do Self-Supervised Models Transfer? , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xingchao Peng,et al.  Network Architecture Search for Domain Adaptation , 2020, ArXiv.

[17]  Ashish Kapoor,et al.  Do Adversarially Robust ImageNet Models Transfer Better? , 2020, NeurIPS.

[18]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[19]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[20]  Martin Schrimpf,et al.  Simulating a Primary Visual Cortex at the Front of CNNs Improves Robustness to Image Perturbations , 2020, bioRxiv.

[21]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[22]  Kurt Keutzer,et al.  Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.

[23]  Junnan Li,et al.  Prototypical Contrastive Learning of Unsupervised Representations , 2020, ICLR.

[24]  Dawn Song,et al.  Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.

[25]  Junnan Li,et al.  Improving out-of-distribution generalization via multi-task self-supervised pretraining , 2020, ArXiv.

[26]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[27]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[28]  Tal Hassner,et al.  LEEP: A New Measure to Evaluate Transferability of Learned Representations , 2020, ICML.

[29]  Kush R. Varshney,et al.  Invariant Risk Minimization Games , 2020, ICML.

[30]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[32]  Kun Kuang,et al.  Stable Learning via Sample Reweighting , 2019, AAAI.

[33]  Yuki M. Asano,et al.  Self-labelling via simultaneous clustering and representation learning , 2019, ICLR.

[34]  Daniel C. Castro,et al.  Domain Generalization via Model-Agnostic Learning of Semantic Features , 2019, NeurIPS.

[35]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[36]  Leonidas J. Guibas,et al.  An Information-Theoretic Approach to Transferability in Task Transfer Learning , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[37]  Tal Hassner,et al.  Transferability and Hardness of Supervised Classification Tasks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Dawn Song,et al.  Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[40]  Peng Cui,et al.  Towards Non-I.I.D. image classification: A dataset and baselines , 2019, Pattern Recognit..

[41]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[42]  Kan Chen,et al.  Billion-scale semi-supervised learning for image classification , 2019, ArXiv.

[43]  A. Schwing,et al.  Knowledge Flow: Improve Upon Your Teachers , 2019, ICLR.

[44]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2019, ICLR.

[45]  Fabio Maria Carlucci,et al.  Domain Generalization by Solving Jigsaw Puzzles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Bo Wang,et al.  Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[48]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[49]  Bo Li,et al.  Stable Prediction across Unknown Environments , 2018, KDD.

[50]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Yongxin Yang,et al.  Learning to Generalize: Meta-Learning for Domain Generalization , 2017, AAAI.

[53]  Yongxin Yang,et al.  Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Sethuraman Panchanathan,et al.  Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[56]  Ben Glocker,et al.  Efficient variational Bayesian neural network ensembles for outlier detection , 2017, ICLR.

[57]  D. Pati,et al.  Bayesian model selection consistency and oracle inequality with intractable marginal likelihood , 2017, 1701.00311.

[58]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[59]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[62]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[63]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Xiaofang Xu,et al.  Bayesian Variable Selection and Estimation for Group Lasso , 2015, 1512.01013.

[66]  Martin J. Wainwright,et al.  On the Computational Complexity of High-Dimensional Bayesian Variable Selection , 2015, ArXiv.

[67]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[69]  N. Narisetty,et al.  Bayesian variable selection with shrinking and diffusing priors , 2014, 1405.6545.

[70]  Sebastiano Vigna,et al.  A Weighted Correlation Index for Rankings with Ties , 2014, WWW.

[71]  Veronika Rockova,et al.  EMVS: The EM Approach to Bayesian Variable Selection , 2014 .

[72]  Ye Xu,et al.  Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias , 2013, 2013 IEEE International Conference on Computer Vision.

[73]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[74]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[75]  N. Zhang,et al.  Bayesian Variable Selection in Structured High-Dimensional Covariate Spaces With Applications in Genomics , 2010 .

[76]  G. Casella,et al.  Consistency of Bayesian procedures for variable selection , 2009, 0904.2978.

[77]  R. O’Hara,et al.  A review of Bayesian variable selection methods: what, how and which , 2009 .

[78]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[79]  A. V. D. Vaart,et al.  Nonparametric Bayesian model selection and averaging , 2008, 0802.0069.

[80]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[81]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[82]  David J. C. MacKay,et al.  Choice of Basis for Laplace Approximation , 1998, Machine Learning.

[83]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[84]  Sylvia Richardson,et al.  Stochastic search variable selection , 1995 .

[85]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[86]  H. Robbins A Stochastic Approximation Method , 1951 .

[87]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[88]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[89]  A. Linear-probe,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021 .

[90]  Boris Katz,et al.  ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.

[91]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[92]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[93]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.