Multiple Source Domain Adaptation with Adversarial Training of Neural Networks

While domain adaptation has been actively researched in recent years, most theoretical results and algorithms focus on the single-source-single-target adaptation setting. Naive application of such algorithms on multiple source domain adaptation problem may lead to suboptimal solutions. As a step toward bridging the gap, we propose a new generalization bound for domain adaptation when there are multiple source domains with labeled instances and one target domain with unlabeled instances. Compared with existing bounds, the new bound does not require expert knowledge about the target distribution, nor the optimal combination rule for multisource domains. Interestingly, our theory also leads to an efficient learning strategy using adversarial neural networks: we show how to interpret it as learning feature representations that are invariant to the multiple domain shifts while still being discriminative for the learning task. To this end, we propose two models, both of which we call multisource domain adversarial networks (MDANs): the first model optimizes directly our bound, while the second model is a smoothed approximation of the first one, leading to a more data-efficient and task-adaptive model. The optimization tasks of both models are minimax saddle point problems that can be optimized by adversarial training. To demonstrate the effectiveness of MDANs, we conduct extensive experiments showing superior adaptation performance on three real-world datasets: sentiment analysis, digit classification, and vehicle counting.

[1]  José M. F. Moura,et al.  Understanding Traffic Density from Large-Scale Web Camera Data , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Regina Barzilay,et al.  Aspect-augmented Adversarial Networks for Domain Adaptation , 2017, TACL.

[3]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Trevor Darrell,et al.  FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation , 2016, ArXiv.

[5]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[6]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[7]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[8]  François Laviolette,et al.  Domain-Adversarial Neural Networks , 2014, ArXiv.

[9]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[10]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11]  Yishay Mansour,et al.  Robust domain adaptation , 2013, Annals of Mathematics and Artificial Intelligence.

[12]  Pascal Fua,et al.  Non-Linear Domain Adaptation with Boosting , 2013, NIPS.

[13]  Brian C. Lovell,et al.  Unsupervised Domain Adaptation by Domain Invariant Projection , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[15]  François Laviolette,et al.  A PAC-Bayesian Approach for Domain Adaptation with Specialization to Linear Classifiers , 2013, ICML.

[16]  Kristen Grauman,et al.  Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation , 2013, ICML.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Trevor Darrell,et al.  Discovering Latent Domains for Multisource Domain Adaptation , 2012, ECCV.

[19]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[20]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[21]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[23]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[24]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[25]  Yishay Mansour,et al.  Multiple Source Adaptation and the Rényi Divergence , 2009, UAI.

[26]  Masashi Sugiyama,et al.  Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation , 2009, J. Inf. Process..

[27]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[28]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[29]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[30]  Mehryar Mohri,et al.  Sample Selection Bias Correction Theory , 2008, ALT.

[31]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[32]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[33]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[34]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[35]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[36]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[37]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[38]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[39]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[40]  Mehryar Mohri,et al.  Domain adaptation and sample bias correction theory and algorithm for regression , 2014, Theor. Comput. Sci..

[41]  Tara N. Sainath,et al.  The shared views of four research groups ) , 2012 .

[42]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[43]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[44]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[45]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[46]  A. Krizhevsky ImageNet Classification with Deep Convolutional Neural Networks , 2022 .