TranSlider: Transfer Ensemble Learning from Exploitation to Exploration

In transfer learning, what and where to transfer has been widely studied. Nevertheless, the learned transfer strategies are at high risk of over-fitting, especially when only a few annotated instances are available in the target domain. In this paper, we introduce the concept of transfer ensemble learning, a new direction to tackle the over-fitting of transfer strategies. Intuitively, models with different transfer strategies offer various perspectives on what and where to transfer. Therefore a core problem is to search these diversely transferred models for ensemble so as to achieve better generalization. Towards this end, we propose the Transferability Slider (TranSlider) for transfer ensemble learning. By decreasing the transferability, we obtain a spectrum of base models ranging from pure exploitation of the source model to unconstrained exploration for the target domain. Furthermore, the manner of decreasing transferability with parameter sharing guarantees fast optimization at no additional training cost. Finally, we conduct extensive experiments with various analyses, which demonstrate that TranSlider achieves the state-of-the-art on comprehensive benchmark datasets.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Joachim Bingel,et al.  Sluice networks: Learning what to share between loosely related tasks , 2017, ArXiv.

[3]  Rogério Schmidt Feris,et al.  SpotTune: Transfer Learning Through Adaptive Fine-Tuning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jinwoo Shin,et al.  Learning What and Where to Transfer , 2019, ICML.

[5]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[6]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[7]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[8]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[9]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Mark Sandler,et al.  K For The Price Of 1: Parameter Efficient Multi-task And Transfer Learning , 2018, ICLR.

[12]  Jin Young Choi,et al.  Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons , 2018, AAAI.

[13]  Michael I. Jordan,et al.  Towards Understanding the Transferability of Deep Representations , 2019, ArXiv.

[14]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[15]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[17]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[18]  Alex Krizhevsky,et al.  One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[19]  François Fleuret,et al.  Knowledge Transfer with Jacobian Matching , 2018, ICML.

[20]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[21]  Firoj Alam,et al.  Domain Adaptation with Adversarial Training and Graph Embeddings , 2018, ACL.

[22]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Haoyi Xiong,et al.  DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks , 2019, ICLR.

[24]  Zi-Yi Dou,et al.  Investigating Meta-Learning Algorithms for Low-Resource Natural Language Understanding Tasks , 2019, EMNLP.

[25]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[26]  Graham Neubig,et al.  Unsupervised Domain Adaptation for Neural Machine Translation with Domain-Aware Feature Embeddings , 2019, EMNLP.

[27]  Jiaxiang Wu,et al.  Collaborative Channel Pruning for Deep Networks , 2019, ICML.

[28]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[29]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[30]  Kilian Q. Weinberger,et al.  Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.

[31]  Anurag Mittal,et al.  A Zero-Shot Framework for Sketch-based Image Retrieval , 2018, ECCV.

[32]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[34]  Xuhong Li,et al.  Explicit Inductive Bias for Transfer Learning with Convolutional Networks , 2018, ICML.

[35]  Alan L. Yuille,et al.  Snapshot Distillation: Teacher-Student Optimization in One Generation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[37]  Amos J. Storkey,et al.  How to train your MAML , 2018, ICLR.

[38]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[39]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[40]  Yu Zhang,et al.  Parameter Transfer Unit for Deep Neural Networks , 2018, PAKDD.

[41]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.