Towards effective deep transfer via attentive feature alignment

Training a deep convolutional network from scratch requires a large amount of labeled data, which however may not be available for many practical tasks. To alleviate the data burden, a practical approach is to adapt a pre-trained model learned on the large source domain to the target domain, but the performance can be limited when the source and target domain data distributions have large differences. Some recent works attempt to alleviate this issue by imposing feature alignment over the intermediate feature maps between the source and target networks. However, for a source model, many of the channels/spatial-features for each layer can be irrelevant to the target task. Thus, directly applying feature alignment may not achieve promising performance. In this paper, we propose an Attentive Feature Alignment (AFA) method for effective domain knowledge transfer by identifying and attending on the relevant channels and spatial features between two domains. To this end, we devise two learnable attentive modules at both the channel and spatial levels. We then sequentially perform attentive spatial- and channel-level feature alignments between the source and target networks, in which the target model and attentive module are learned simultaneously. Moreover, we theoretically analyze the generalization performance of our method, which confirms its superiority to existing methods. Extensive experiments on both image classification and face recognition demonstrate the effectiveness of our method. The source code and the pre-trained models are available at https://github.com/xiezheng-cs/AFAhttps://github.com/xiezheng-cs/AFA.

[1]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[3]  Yizhou Yu,et al.  Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-Tuning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[6]  Lei Xie,et al.  On the localness modeling for the self-attention based end-to-end speech synthesis , 2020, Neural Networks.

[7]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[8]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[9]  Mingkui Tan,et al.  Deep Transferring Quantization , 2020, ECCV.

[10]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[11]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[12]  Yang Song,et al.  Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[14]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[15]  Ying Wang,et al.  Multiclass heterogeneous domain adaptation via bidirectional ECOC projection , 2019, Neural Networks.

[16]  Yang Liu,et al.  MobileFaceNets: Efficient CNNs for Accurate Real-time Face Verification on Mobile Devices , 2018, CCBR.

[17]  Andrey Kormilitzin,et al.  Named Entity Recognition in Electronic Health Records Using Transfer Learning Bootstrapped Neural Networks , 2019, Neural Networks.

[18]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[19]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yuxin Peng,et al.  Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification , 2018, IJCAI.

[22]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jing Liu,et al.  Discrimination-aware Channel Pruning for Deep Neural Networks , 2018, NeurIPS.

[24]  LinLin Shen,et al.  Directional binary code with application to PolyU near-infrared face database , 2010, Pattern Recognit. Lett..

[25]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Qingyao Wu,et al.  From Whole Slide Imaging to Microscopy: Deep Microscopy Adaptation Network for Histopathology Cancer Image Classification , 2019, MICCAI.

[27]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Pedro Antonio Gutiérrez,et al.  Multi-task learning for the prediction of wind power ramp events with deep neural networks , 2020, Neural Networks.

[29]  Ning Ye,et al.  Learning Cascade Attention for fine-grained image classification , 2020, Neural Networks.

[30]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[32]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[34]  Qing He,et al.  Multi-representation adaptation network for cross-domain image classification , 2019, Neural Networks.

[35]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[39]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[40]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[41]  Yu Qiao,et al.  Sparse Deep Transfer Learning for Convolutional Neural Network , 2017, AAAI.

[42]  Xuhong Li,et al.  Explicit Inductive Bias for Transfer Learning with Convolutional Networks , 2018, ICML.

[43]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[44]  Mehmet Aygun,et al.  Exploiting Convolution Filter Patterns for Transfer Learning , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[45]  Yuval Elovici,et al.  Deep feature transfer learning for trusted and automated malware signature generation in private cloud environments , 2020, Neural Networks.

[46]  Zhaoxiang Zhang,et al.  DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer , 2017, AAAI.

[47]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[48]  Sridha Sridharan,et al.  Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection , 2017, Neural Networks.

[49]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[50]  Wei-Ying Ma,et al.  Effective Domain Knowledge Transfer with Soft Fine-tuning , 2019, ArXiv.

[51]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[52]  Haoyi Xiong,et al.  DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks , 2019, ICLR.

[53]  Yuxin Peng,et al.  Cross-modal Common Representation Learning by Hybrid Transfer Network , 2017, IJCAI.

[54]  Liran Yang,et al.  Robust adaptation regularization based on within-class scatter for domain adaptation , 2020, Neural Networks.

[55]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[58]  ZhangBaochang,et al.  Directional binary code with application to PolyU near-infrared face database , 2010 .

[59]  Natarajan Sriraam,et al.  EEG based multi-class seizure type classification using convolutional neural network and transfer learning , 2020, Neural Networks.