Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
暂无分享,去创建一个
Ari S. Morcos | Ali Farhadi | Y. Carmon | Gabriel Ilharco | Hongseok Namkoong | Simon Kornblith | R. Roelofs | Ludwig Schmidt | Mitchell Wortsman | S. Gadre | Raphael Gontijo-Lopes
[1] Joan Puigcerver,et al. Deep Ensembles for Low-Data Transfer Learning , 2020, ArXiv.
[2] D. Song,et al. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[3] S. Gelly,et al. Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.
[4] Aaron Klein,et al. Towards Automatically-Tuned Neural Networks , 2016, AutoML@ICML.
[5] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[6] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[7] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.
[8] Christian Gagné,et al. Bayesian Hyperparameter Optimization for Ensemble Learning , 2016, UAI.
[9] Roy Bar-Haim,et al. The Second PASCAL Recognising Textual Entailment Challenge , 2006 .
[10] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[11] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.
[12] B. Matthews. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.
[13] Kai Zou,et al. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.
[14] Iasonas Kokkinos,et al. Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[15] Sebastian Nowozin,et al. Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.
[16] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[17] Rich Caruana,et al. Getting the Most Out of Ensemble Selection , 2006, Sixth International Conference on Data Mining (ICDM'06).
[18] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] M. Mozer,et al. Mitigating bias in calibration error estimation , 2020, AISTATS.
[20] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[21] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[22] Percy Liang,et al. Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution , 2022, ICLR.
[23] Sara Beery,et al. The iWildCam 2021 Competition Dataset , 2021, ArXiv.
[24] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[25] Thomas G. Dietterich. Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.
[26] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[27] Daniel M. Roy,et al. Linear Mode Connectivity and the Lottery Ticket Hypothesis , 2019, ICML.
[28] Pang Wei Koh,et al. WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2020, ICML.
[29] Dawn Song,et al. Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.
[31] Eric P. Xing,et al. Learning Robust Global Representations by Penalizing Local Predictive Power , 2019, NeurIPS.
[32] Behnam Neyshabur,et al. The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning , 2021, Trans. Mach. Learn. Res..
[33] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.
[34] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.
[35] Michael S. Bernstein,et al. On the Opportunities and Risks of Foundation Models , 2021, ArXiv.
[36] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[37] Jasper Snoek,et al. Hyperparameter Ensembles for Robustness and Uncertainty Quantification , 2020, NeurIPS.
[38] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[39] Behnam Neyshabur,et al. What is being transferred in transfer learning? , 2020, NeurIPS.
[40] Raphael Gontijo Lopes,et al. No One Representation to Rule Them All: Overlapping Features of Training Methods , 2021, ICLR.
[41] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[42] Geoffrey E. Hinton,et al. Lookahead Optimizer: k steps forward, 1 step back , 2019, NeurIPS.
[43] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[44] Boris Katz,et al. ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.
[45] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[46] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[47] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[48] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[49] Rich Caruana,et al. Ensemble selection from libraries of models , 2004, ICML.
[50] Quoc V. Le,et al. Combined Scaling for Zero-shot Transfer Learning , 2021, ArXiv.
[51] Benjamin Recht,et al. Evaluating Machine Accuracy on ImageNet , 2020, ICML.
[52] Cordelia Schmid,et al. Optimized Generic Feature Learning for Few-shot Classification across Domains , 2020, ArXiv.
[53] Colin Raffel,et al. Merging Models with Fisher-Weighted Averaging , 2021, ArXiv.
[54] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.
[55] Learning to Prompt for Vision-Language Models , 2021, ArXiv.
[56] Xuhong Li,et al. Explicit Inductive Bias for Transfer Learning with Convolutional Networks , 2018, ICML.
[57] Kan Chen,et al. Billion-scale semi-supervised learning for image classification , 2019, ArXiv.
[58] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Matthieu Guillaumin,et al. Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.
[60] Jong Wook Kim,et al. Robust fine-tuning of zero-shot models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[62] Prabhat,et al. Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.
[63] Rohan Anil,et al. Knowledge distillation: A good teacher is patient and consistent , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Quoc V. Le,et al. CoAtNet: Marrying Convolution and Attention for All Data Sizes , 2021, NeurIPS.
[65] Ariel Kleiner,et al. Sharpness-Aware Minimization for Efficiently Improving Generalization , 2020, ICLR.
[66] Gordon Christie,et al. Functional Map of the World , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[67] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[68] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.
[69] Ali Farhadi,et al. Label Refinery: Improving ImageNet Classification through Label Progression , 2018, ArXiv.
[70] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[71] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[72] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[73] Andrew Gordon Wilson,et al. Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.
[74] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.
[75] Ali Farhadi,et al. Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.
[76] Peng Gao,et al. CLIP-Adapter: Better Vision-Language Models with Feature Adapters , 2021, Int. J. Comput. Vis..
[77] Matthijs Douze,et al. Fixing the train-test resolution discrepancy , 2019, NeurIPS.
[78] Krista A. Ehinger,et al. SUN Database: Exploring a Large Collection of Scene Categories , 2014, International Journal of Computer Vision.
[79] Alexander Kolesnikov,et al. Scaling Vision Transformers , 2021, ArXiv.
[80] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[81] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[82] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.
[83] Ido Dagan,et al. The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.
[84] Atsuto Maki,et al. From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[85] Jitendra Malik,et al. Analyzing the Performance of Multilayer Neural Networks for Object Recognition , 2014, ECCV.
[86] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[87] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[88] Haoyi Xiong,et al. RIFLE: Backpropagation in Depth for Deep Transfer Learning through Re-Initializing the Fully-connected LayEr , 2020, ICML.
[89] Peng Gao,et al. Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling , 2021, ArXiv.
[90] Rogério Schmidt Feris,et al. SpotTune: Transfer Learning Through Adaptive Fine-Tuning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[91] Saining Xie,et al. SLIP: Self-supervision meets Language-Image Pre-training , 2021, ECCV.
[92] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[93] Zhangjie Cao,et al. Zoo-Tuning: Adaptive Transfer from a Zoo of Models , 2021, ICML.
[94] Xiaohua Zhai,et al. Are we done with ImageNet? , 2020, ArXiv.
[95] Andrew Gordon Wilson,et al. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.
[96] Eric Bauer,et al. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.