Multi-source Subnetwork-level Transfer in CNNs Using Filter-Trees

Convolutional Neural Networks (CNNs) are very effective for many pattern recognition tasks. However, training deep CNNs needs extensive computation and large training data. In this paper we propose Bank of Filter-Trees (BFT) as a transfer learning mechanism for improving efficiency of learning CNNs. A filter-tree corresponding to a filter in $k^{th}$ convolutional layer of a CNN is a subnetwork consisting of the filter along with all its connections to filters in all preceding layers. An ensemble of such filter-trees created from many CNNs learnt on different but related tasks, forms the BFT. To learn a new CNN, we sample from the BFT to select a set of filter trees. This fixes the first few layers of the target net and only the remaining network would be learnt using training data of new task. Through simulations we demonstrate the effectiveness of this idea of BFT. This method constitutes a novel transfer learning technique where transfer is at a subnetwork level; transfer can be effected from multiple source networks, the number of weights to be learnt is same as a single CNN; and, with no finetuning of the transferred weights, the performance achieved is quite good. In all our experiments the number of filter trees sampled is kept same as the number of filters in the $k^{th}$ layer of the new CNN. This is not a limitation it is just to keep the number of filters freshly learnt in the subsequent layers equal to a single CNN for a fair comparison.

[1]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[2]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[3]  Bryan R. Conroy,et al.  A Common, High-Dimensional Model of the Representational Space in Human Ventral Temporal Cortex , 2011, Neuron.

[4]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[5]  Nitish Srivastava,et al.  Discriminative Transfer Learning with Tree-based Priors , 2013, NIPS.

[6]  P. S. Sastry,et al.  Bank of Weight Filters for Deep CNNs , 2016, ACML.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Tao Xiang,et al.  Sketch-a-Net that Beats Humans , 2015, BMVC.

[9]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[10]  Jack Mostow,et al.  Direct Transfer of Learned Information Among Neural Networks , 1991, AAAI.

[11]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[12]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[13]  Yoav Freund,et al.  Boosting: Foundations and Algorithms , 2012 .

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Changhu Wang,et al.  Network Morphism , 2016, ICML.

[16]  Yizhou Yu,et al.  Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-Tuning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Daniel L. Silver,et al.  Inductive transfer with context-sensitive neural networks , 2008, Machine Learning.

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[20]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[21]  Lorien Y. Pratt,et al.  Discriminability-Based Transfer between Neural Networks , 1992, NIPS.

[22]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Tom Michael Mitchell,et al.  Predicting Human Brain Activity Associated with the Meanings of Nouns , 2008, Science.

[24]  Noel E. Sharkey,et al.  Adaptive generalisation , 1994, Artificial Intelligence Review.

[25]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .