Transferable Normalization: Towards Improving Transferability of Deep Neural Networks

Deep neural networks (DNNs) excel at learning representations when trained on large-scale datasets. Pre-trained DNNs also show strong transferability when fine-tuned to other labeled datasets. However, such transferability becomes weak when the target dataset is fully unlabeled as in Unsupervised Domain Adaptation (UDA). We envision that the loss of transferability may stem from the intrinsic limitation of the architecture design of DNNs. In this paper, we delve into the components of DNN architectures and propose Transferable Normalization (TransNorm) in place of existing normalization techniques. TransNorm is an end-to-end trainable layer to make DNNs more transferable across domains. As a general method, TransNorm can be easily applied to various deep neural networks and domain adaption methods, without introducing any extra hyper-parameters or learnable parameters. Empirical results justify that TransNorm not only improves classification accuracies but also accelerates convergence for mainstream DNN-based domain adaptation methods.

[1]  Cédric Villani,et al.  The Wasserstein distances , 2009 .

[2]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[3]  Sethuraman Panchanathan,et al.  Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  Jianmin Wang,et al.  Transferable Attention for Domain Adaptation , 2019, AAAI.

[7]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[10]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[11]  Jianmin Wang,et al.  Multi-Adversarial Domain Adaptation , 2018, AAAI.

[12]  Michael I. Jordan,et al.  Conditional Adversarial Domain Adaptation , 2017, NeurIPS.

[13]  Kun Zhang,et al.  On Learning Invariant Representation for Domain Adaptation , 2019, ArXiv.

[14]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[15]  Pedro H. O. Pinheiro,et al.  Unsupervised Domain Adaptation with Similarity Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Michael I. Jordan,et al.  Unsupervised Domain Adaptation with Residual Transfer Networks , 2016, NIPS.

[17]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[18]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[19]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[20]  Michael I. Jordan,et al.  Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation , 2019, ICML.

[21]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[22]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[23]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[24]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[25]  Kristen Grauman,et al.  Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation , 2013, ICML.

[26]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[27]  Yue Cao,et al.  Transferable Representation Learning with Deep Adaptation Networks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[29]  Hermann Ney,et al.  A Convergence Analysis of Log-Linear Training , 2011, NIPS.

[30]  Yuchen Zhang,et al.  Bridging Theory and Algorithm for Domain Adaptation , 2019, ICML.

[31]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[33]  Carlos D. Castillo,et al.  Generate to Adapt: Aligning Domains Using Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[35]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Kate Saenko,et al.  VisDA: The Visual Domain Adaptation Challenge , 2017, ArXiv.

[37]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[38]  Fabio Maria Carlucci,et al.  AutoDIAL: Automatic Domain Alignment Layers , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[40]  Tatsuya Harada,et al.  Maximum Classifier Discrepancy for Unsupervised Domain Adaptation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, ICCV.

[42]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[43]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[44]  Jiaying Liu,et al.  Adaptive Batch Normalization for practical domain adaptation , 2018, Pattern Recognit..

[45]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .