Knowledge Projection for Deep Neural Networks

While deeper and wider neural networks are actively pushing the performance limits of various computer vision and machine learning tasks, they often require large sets of labeled data for effective training and suffer from extremely high computational complexity. In this paper, we will develop a new framework for training deep neural networks on datasets with limited labeled samples using cross-network knowledge projection which is able to improve the network performance while reducing the overall computational complexity significantly. Specifically, a large pre-trained teacher network is used to observe samples from the training data. A projection matrix is learned to project this teacher-level knowledge and its visual representations from an intermediate layer of the teacher network to an intermediate layer of a thinner and faster student network to guide and regulate its training process. Both the intermediate layers from the teacher network and the injection layers from the student network are adaptively selected during training by evaluating a joint loss function in an iterative manner. This knowledge projection framework allows us to use crucial knowledge learned by large networks to guide the training of thinner student networks, avoiding over-fitting, achieving better network performance, and significantly reducing the complexity. Extensive experimental results on benchmark datasets have demonstrated that our proposed knowledge projection approach outperforms existing methods, improving accuracy by up to 4% while reducing network complexity by 4 to 10 times, which is very attractive for practical applications of deep neural networks.

[1]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Zhuowen Tu,et al.  Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree , 2015, AISTATS.

[3]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[6]  Andrea Vedaldi,et al.  R-CNN minus R , 2015, BMVC.

[7]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Shuicheng Yan,et al.  Convolutional Fusion Network for Face Verification in the Wild , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[10]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[11]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[12]  Jeff G. Schneider,et al.  Flexible Transfer Learning under Support and Model Shift , 2014, NIPS.

[13]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[14]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[15]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[16]  N. Sudha,et al.  A Self-Configurable Systolic Architecture for Face Recognition System Based on Principal Component Neural Network , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[18]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[19]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[20]  Zhuowen Tu,et al.  Deep FisherNet for Object Classification , 2016, ArXiv.

[21]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[22]  Mengjie Zhang,et al.  Domain Adaptive Neural Networks for Object Recognition , 2014, PRICAI.

[23]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[24]  François Laviolette,et al.  Domain-Adversarial Neural Networks , 2014, ArXiv.

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[27]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[28]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[29]  Jian Sun,et al.  Accelerating Very Deep Convolutional Networks for Classification and Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[31]  Luca Maria Gambardella,et al.  High-Performance Neural Networks for Visual Object Classification , 2011, ArXiv.

[32]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[33]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[34]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[35]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[36]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[37]  Jiri Matas,et al.  All you need is a good init , 2015, ICLR.

[38]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[39]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[40]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[41]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[42]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[44]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Pichao Wang,et al.  Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[46]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[47]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[48]  Pascal Vincent,et al.  The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.

[49]  Hoi-Jun Yoo,et al.  A Configurable Heterogeneous Multicore Architecture With Cellular Neural Network for Real-Time Object Recognition , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[50]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[51]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[53]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[54]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).