Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How

With the ever-increasing number of pretrained models, machine learning practitioners are continuously faced with which pretrained model to use, and how to finetune it for a new dataset. In this paper, we propose a methodology that jointly searches for the optimal pretrained model and the hyperparameters for finetuning it. Our method transfers knowledge about the performance of many pretrained models with multiple hyperparameter configurations on a series of datasets. To this aim, we evaluated over 20k hyperparameter configurations for finetuning 24 pretrained image classification models on 87 datasets to generate a large-scale meta-dataset. We meta-learn a multi-fidelity performance predictor on the learning curves of this meta-dataset and use it for fast hyperparameter optimization on new datasets. We empirically demonstrate that our resulting approach can quickly select an accurate pretrained model for a new dataset together with its optimal hyperparameters.

[1]  Sebastian Pineda Arango,et al.  Deep Pipeline Embeddings for AutoML , 2023, KDD.

[2]  Sebastian Pineda Arango,et al.  Deep Ranking Ensembles for Hyperparameter Optimization , 2023, ICLR.

[3]  J. N. Rijn,et al.  Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification , 2023, NeurIPS.

[4]  Annie S. Chen,et al.  Surgical Fine-Tuning Improves Adaptation to Distribution Shifts , 2022, ICLR.

[5]  Boris Knyazev,et al.  Model Zoos: A Dataset of Diverse Populations of Neural Network Models , 2022, NeurIPS.

[6]  Boris Knyazev,et al.  Hyper-Representations for Pre-Training and Transfer Learning , 2022, ArXiv.

[7]  F. Hutter,et al.  Zero-Shot AutoML with Pretrained Models , 2022, ICML.

[8]  Jianmin Wang,et al.  Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models , 2022, NeurIPS.

[9]  Ari S. Morcos,et al.  Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time , 2022, ICML.

[10]  Josif Grabocka,et al.  Supervising the Multi-Fidelity Race of Hyperparameter Configurations , 2022, NeurIPS.

[11]  Judy Hoffman,et al.  Scalable Diverse Model Selection for Accessible Transfer Learning , 2021, NeurIPS.

[12]  Michael I. Jordan,et al.  Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs , 2021, J. Mach. Learn. Res..

[13]  Behnam Neyshabur,et al.  Exploring the Limits of Large Scale Pre-training , 2021, ICLR.

[14]  Jong Wook Kim,et al.  Robust fine-tuning of zero-shot models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Zhangjie Cao,et al.  Zoo-Tuning: Adaptive Transfer from a Zoo of Models , 2021, ICML.

[16]  Yelong Shen,et al.  LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.

[17]  Matthijs Douze,et al.  XCiT: Cross-Covariance Image Transformers , 2021, NeurIPS.

[18]  Li Dong,et al.  BEiT: BERT Pre-Training of Image Transformers , 2021, ICLR.

[19]  Josif Grabocka,et al.  HPO-B: A Large-Scale Reproducible Benchmark for Black-Box HPO based on OpenML , 2021, NeurIPS Datasets and Benchmarks.

[20]  P. Chaudhari,et al.  Model Zoo: A Growing Brain That Learns Continually , 2021, ICLR.

[21]  Frank Hutter,et al.  DEHB: Evolutionary Hyberband for Scalable, Robust and Efficient Hyperparameter Optimization , 2021, IJCAI.

[22]  Yao Guo,et al.  TransTailor: Pruning the Pre-trained Model for Improved Transfer Learning , 2021, AAAI.

[23]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[24]  Mingsheng Long,et al.  LogME: Practical Assessment of Pre-trained Models for Transfer Learning , 2021, ICML.

[25]  Zhiqiang Shen,et al.  Partial Is Better Than All: Revisiting Fine-tuning Strategy for Few-shot Learning , 2021, AAAI.

[26]  Josif Grabocka,et al.  Few-Shot Bayesian Optimization with Deep Kernel Surrogates , 2021, ICLR.

[27]  Mingsheng Long,et al.  Bi-tuning of Pre-trained Representations , 2020, ArXiv.

[28]  Fela Winkelmolen,et al.  Practical and sample efficient zero-shot HPO , 2020, ArXiv.

[29]  David Salinas,et al.  A Quantile-based Approach for Hyperparameter Transfer Learning , 2020, ICML.

[30]  Tal Hassner,et al.  LEEP: A New Measure to Evaluate Transferability of Learned Representations , 2020, ICML.

[31]  Stefano Soatto,et al.  Rethinking the Hyperparameters for Fine-tuning , 2020, ICLR.

[32]  S. Gelly,et al.  Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.

[33]  Leonidas J. Guibas,et al.  An Information-Theoretic Approach to Transferability in Task Transfer Learning , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[34]  Tal Hassner,et al.  Transferability and Hardness of Supervised Classification Tasks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Lars Schmidt-Thieme,et al.  Dataset2Vec: learning dataset meta-features , 2019, Data Mining and Knowledge Discovery.

[36]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[37]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[38]  Haoyi Xiong,et al.  DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks , 2019, ICLR.

[39]  Rogério Schmidt Feris,et al.  SpotTune: Transfer Learning Through Adaptive Fine-Tuning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  P. Frazier Bayesian Optimization , 2018, Hyperparameter Optimization in Machine Learning.

[41]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[42]  Yang Song,et al.  Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Frank Hutter,et al.  Maximizing acquisition functions for Bayesian optimization , 2018, NeurIPS.

[44]  Yves Grandvalet,et al.  Explicit Inductive Bias for Transfer Learning with Convolutional Networks , 2018, ICML.

[45]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Aaron Klein,et al.  Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[47]  Kevin G. Jamieson,et al.  Hyperband: Bandit-Based Configuration Evaluation for Hyperparameter Optimization , 2016, ICLR.

[48]  Lars Schmidt-Thieme,et al.  Two-Stage Transfer Surrogate Model for Automatic Hyperparameter Optimization , 2016, ECML/PKDD.

[49]  Cheng Guo,et al.  Entity Embeddings of Categorical Variables , 2016, ArXiv.

[50]  Lars Schmidt-Thieme,et al.  Sequential Model-Free Hyperparameter Tuning , 2015, 2015 IEEE International Conference on Data Mining.

[51]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[52]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[53]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[54]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[55]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[56]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[57]  F. Hutter,et al.  Transfer NAS with Meta-learned Bayesian Surrogates , 2023, ICLR.

[58]  F. Hutter,et al.  Gray-Box Gaussian Processes for Automated Reinforcement Learning , 2023, ICLR.

[59]  Heng Wang,et al.  Revisit Finetuning strategy for Few-Shot Learning to Transfer the Emdeddings , 2023, ICLR.

[60]  Mingsheng Long,et al.  Co-Tuning for Transfer Learning , 2020, NeurIPS.

[61]  Mingsheng Long,et al.  Stochastic Normalization , 2020, NeurIPS.

[62]  Xinyang Chen,et al.  Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning , 2019, NeurIPS.

[63]  Automated Machine Learning - Methods, Systems, Challenges , 2019, Automated Machine Learning.

[64]  Tanja Hueber,et al.  Gaussian Processes For Machine Learning , 2016 .