Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning

Transfer-learning methods aim to improve performance in a data-scarce target domain using a model pretrained on a data-rich source domain. A cost-efficient strategy, linear probing , involves freezing the source model and training a new classification head for the target domain. This strategy is outperformed by a more costly but state-of-the-art method— fine-tuning all parameters of the source model to the target domain—possibly because fine-tuning allows the model to leverage useful information from intermediate layers which is otherwise discarded by the previously trained later layers. We explore the hypothesis that these intermediate layers might be directly exploited. We propose a method, Head-to-Toe probing (H EAD 2T OE ), that selects features from all layers of the source model to train a classification head for the target domain. In evaluations on the Visual Task Adaptation Benchmark (VTAB), Head2Toe matches performance obtained with fine-tuning on average while reducing training and storage cost a hundred fold or more, but criti-cally, for out-of-distribution transfer, Head2Toe outperforms fine-tuning 1 . we demonstrate the effectiveness of group lasso on identifying relevant intermediate features of a ResNet-50 trained on ImageNet. We rank all features by their relevance score, s i , and select groups of 2048 consecutive features beginning at a particular offset in this ranking. Offset 0 corresponds to selecting the features with largest relevance. We calculate average test accuracies across all VTAB tasks. As the figure shows, test accuracy decreases monotonically with the offset, indicating that the relevance score predicts the importance of including a feature in the linear classifier.

