RIFLE: Backpropagation in Depth for Deep Transfer Learning through Re-Initializing the Fully-connected LayEr

Fine-tuning the deep convolution neural network(CNN) using a pre-trained model helps transfer knowledge learned from larger datasets to the target task. While the accuracy could be largely improved even when the training dataset is small, the transfer learning outcome is usually constrained by the pre-trained model with close CNN weights (Liu et al., 2019), as the backpropagation here brings smaller updates to deeper CNN layers. In this work, we propose RIFLE - a simple yet effective strategy that deepens backpropagation in transfer learning settings, through periodically Re-Initializing the Fully-connected LayEr with random scratch during the fine-tuning procedure. RIFLE brings meaningful updates to the weights of deep CNN layers and improves low-level feature learning, while the effects of randomization can be easily converged throughout the overall learning procedure. The experiments show that the use of RIFLE significantly improves deep transfer learning accuracy on a wide range of datasets, out-performing known tricks for the similar purpose, such as Dropout, DropConnect, StochasticDepth, Disturb Label and Cyclic Learning Rate, under the same settings with 0.5% -2% higher testing accuracy. Empirical cases and ablation studies further indicate RIFLE brings meaningful updates to deep CNN layers with accuracy improved.

[1]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[2]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[3]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[4]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[5]  Tuo Zhao,et al.  Toward Understanding the Importance of Noise in Training Neural Networks , 2019, ICML.

[6]  Yu Zhang,et al.  Parameter Transfer Unit for Deep Neural Networks , 2018, PAKDD.

[7]  Tuo Zhao,et al.  Towards Understanding the Importance of Noise in Training Neural Networks , 2019, ICML 2019.

[8]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[9]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[10]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[11]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[12]  Yann Le Cun,et al.  A Theoretical Framework for Back-Propagation , 1988 .

[13]  Yang Song,et al.  Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Yu Qiao,et al.  Sparse Deep Transfer Learning for Convolutional Neural Network , 2017, AAAI.

[15]  Haoyi Xiong,et al.  DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks , 2019, ICLR.

[16]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[17]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[18]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[19]  Yizhou Yu,et al.  Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-Tuning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Qi Tian,et al.  DisturbLabel: Regularizing CNN on the Loss Layer , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Mehmet Aygun,et al.  Exploiting Convolution Filter Patterns for Transfer Learning , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[22]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xiu-Shen Wei,et al.  In Defense of Fully Connected Layers in Visual Representation Transfer , 2017, PCM.

[25]  Nathan Srebro,et al.  The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..

[26]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[27]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[28]  Xuhong Li,et al.  Explicit Inductive Bias for Transfer Learning with Convolutional Networks , 2018, ICML.

[29]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[30]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[31]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[32]  Yuanzhi Li,et al.  Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.

[33]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[34]  Michael I. Jordan,et al.  Towards Understanding the Transferability of Deep Representations , 2019, ArXiv.