How stable are Transferability Metrics evaluations?

Transferability metrics is a maturing field with increasing interest, which aims at providing heuristics for selecting the most suitable source models to transfer to a given target dataset, without fine-tuning them all. However, existing works rely on custom experimental setups which differ across papers, leading to inconsistent conclusions about which transferability metrics work best. In this paper we conduct a large-scale study by systematically constructing a broad range of 715k experimental setup variations. We discover that even small variations to an experimental setup lead to different conclusions about the superiority of a transferability metric over another. Then we propose better evaluations by aggregating across many experiments, enabling to reach more stable conclusions. As a result, we reveal the superiority of LogME at selecting good source datasets to transfer from in a semantic segmentation scenario, N LEEP at selecting good source architectures in an image classification scenario, and GBC at determining which target task benefits most from a given source model. Yet, no single transferability metric works best in all scenarios.

[1]  J. Uijlings,et al.  Transferability Metrics for Selecting Source Model Ensembles , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  J. Uijlings,et al.  Transferability Estimation using Bhattacharyya Class Separability , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Behnam Neyshabur,et al.  Exploring the Limits of Large Scale Pre-training , 2021, ICLR.

[4]  Judy Hoffman,et al.  Scalable Diverse Model Selection for Accessible Transfer Learning , 2021, NeurIPS.

[5]  Rahul Mazumder,et al.  Newer is not always better: Rethinking transferability metrics, their peculiarities, stability and performance , 2021, ArXiv.

[6]  Lihi Zelnik-Manor,et al.  ImageNet-21K Pretraining for the Masses , 2021, NeurIPS Datasets and Benchmarks.

[7]  Shao-Lun Huang,et al.  OTCE: A Transferability Metric for Cross-Domain Cross-Task Representations , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Thomas Mensink,et al.  Factors of Influence for Transfer Learning Across Diverse Appearance Domains and Task Types , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Mingsheng Long,et al.  LogME: Practical Assessment of Pre-trained Models for Transfer Learning , 2021, ICML.

[10]  Neil Houlsby,et al.  Supervised Transfer Learning at Scale for Medical Imaging , 2021, ArXiv.

[11]  Yukun Zhu,et al.  Ranking Neural Checkpoints , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yang Zhao,et al.  Deep High-Resolution Representation Learning for Visual Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Kshitij Dwivedi,et al.  Duality Diagram Similarity: a generic framework for initialization selection in task transfer learning , 2020, ECCV.

[15]  Scott M. Jordan,et al.  Evaluating the Performance of Reinforcement Learning Algorithms , 2020, ICML.

[16]  Vladlen Koltun,et al.  MSeg: A Composite Dataset for Multi-Domain Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Sadman Sakib Enan,et al.  Semantic Segmentation of Underwater Imagery: Dataset and Benchmark , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Yixin Chen,et al.  DEPARA: Deep Attribution Graph for Deep Knowledge Transferability , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Tal Hassner,et al.  LEEP: A New Measure to Evaluate Transferability of Learned Representations , 2020, ICML.

[20]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[21]  Nicolo Fusi,et al.  Geometric Dataset Distances via Optimal Transport , 2020, NeurIPS.

[22]  Gerhard Nahler,et al.  Pearson Correlation Coefficient , 2020, Definitions.

[23]  S. Fidler,et al.  Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  A. Micheli,et al.  A Fair Comparison of Graph Neural Networks for Graph Classification , 2019, ICLR.

[25]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  J. Uijlings,et al.  The Open Images Dataset V4 , 2018, International Journal of Computer Vision.

[27]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[28]  Yoshua Bengio,et al.  Benchmarking Graph Neural Networks , 2023, J. Mach. Learn. Res..

[29]  Yixin Chen,et al.  Deep Model Transferability from Attribution Maps , 2019, NeurIPS.

[30]  Leonidas J. Guibas,et al.  An Information-Theoretic Approach to Transferability in Task Transfer Learning , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[31]  Tal Hassner,et al.  Transferability and Hardness of Supervised Classification Tasks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Gabriela Csurka,et al.  Visual Localization by Learning Objects-Of-Interest Dense Match Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[34]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Ling Shao,et al.  iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images , 2019, CVPR Workshops.

[36]  Kshitij Dwivedi,et al.  Representation Similarity Analysis for Efficient Task Taxonomy & Transfer Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[38]  Dianhui Chu,et al.  Empirical Study and Improvement on Deep Transfer Learning for Human Activity Recognition , 2018, Sensors.

[39]  C. V. Jawahar,et al.  IDD: A Dataset for Exploring Problems of Autonomous Navigation in Unconstrained Environments , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[40]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Quoc V. Le,et al.  Domain Adaptive Transfer Learning with Specialist Models , 2018, ArXiv.

[43]  Stephan Günnemann,et al.  Pitfalls of Graph Neural Network Evaluation , 2018, ArXiv.

[44]  Pierre-Yves Oudeyer,et al.  How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments , 2018, ArXiv.

[45]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Jiebo Luo,et al.  DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Andreas Geiger,et al.  Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes , 2017, International Journal of Computer Vision.

[49]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Vittorio Ferrari,et al.  COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Oliver Zendel,et al.  How Good Is My Test Data? Introducing Safety Analysis for Computer Vision , 2017, International Journal of Computer Vision.

[53]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[56]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[60]  Trevor Darrell,et al.  Best Practices for Fine-Tuning Visual Classifiers to New Domains , 2016, ECCV Workshops.

[61]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[62]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[65]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[66]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Atsuto Maki,et al.  Factors of Transferability for a Generic ConvNet Representation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[71]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[72]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[74]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[75]  Jaewook Jung,et al.  Results of the ISPRS benchmark on urban object detection and 3D building reconstruction , 2014 .

[76]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[77]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[78]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[79]  Derek Hoiem,et al.  Diagnosing Error in Object Detectors , 2012, ECCV.

[80]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[82]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[83]  Shimon Whiteson,et al.  Introduction to the special issue on empirical evaluations in reinforcement learning , 2011, Machine Learning.

[84]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[85]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[86]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[87]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[88]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[89]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[90]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[91]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[92]  Lorien Y. Pratt,et al.  Discriminability-Based Transfer between Neural Networks , 1992, NIPS.

[93]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .