Unsupervised Knowledge Transfer Using Similarity Embeddings

With the advent of deep neural networks, there is a growing interest in transferring the knowledge from a large and complex model to a smaller and faster one. In this brief, a method for unsupervised knowledge transfer (KT) between neural networks is proposed. To the best of our knowledge, the proposed method is the first method that utilizes similarity-induced embeddings to transfer the knowledge between any two layers of neural networks, regardless of the number of neurons in each of them. By this way, the knowledge is transferred without using any lossy dimensionality reduction transformations or requiring any information about the complex model, except for the activations of the layer used for KT. This is in contrast with most existing approaches that only generate soft-targets for training the smaller neural network or directly use the weights of the larger model. The proposed method is evaluated using six image data sets and it is demonstrated, through extensive experiments, that the knowledge of a neural network can be successfully transferred using different kinds of (synthetic or not) data, ranging from cross-domain data to just randomly generated data.

[1]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[2]  Dong Wang,et al.  Knowledge Transfer Pre-training , 2015, ArXiv.

[3]  Zhiyuan Tang,et al.  Recurrent neural network training with dark knowledge transfer , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  William Chan,et al.  Transferring knowledge from a RNN to a DNN , 2015, INTERSPEECH.

[5]  Pedro M. Domingos,et al.  Deep transfer via second-order Markov logic , 2009, ICML '09.

[6]  Ling Shao,et al.  Learning Deep and Wide: A Spectral Method for Learning Deep Networks , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Jinhui Tang,et al.  Weakly-Shared Deep Transfer Networks for Heterogeneous-Domain Knowledge Propagation , 2015, ACM Multimedia.

[8]  Tianqi Chen,et al.  Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Anastasios Tefas,et al.  Dimensionality Reduction Using Similarity-Induced Embeddings , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Xiaoqiang Lu,et al.  Scene Recognition by Manifold Regularized Deep Learning Architecture , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[17]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[18]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[19]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[20]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[21]  Jiwen Lu,et al.  Deep transfer metric learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, ICCV.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Shih-Fu Chang,et al.  Deep Transfer Network: Unsupervised Domain Adaptation , 2015, ArXiv.