Adversarial Representation Learning for Domain Adaptation

Domain adaptation aims at generalizing a high performance learner to a target domain via utilizing the knowledge distilled from a source domain, which has a different but related data distribution. One type of domain adaptation solutions is to learn feature representations invariant to the change of domains but is discriminative for predicting the target. Recently, the generative adversarial nets (GANs) are widely studied to learn a generator to approximate the true data distribution by trying to fool the adversarial discriminator in a minimax game setting. Inspired by GANs, we propose a novel Adversarial Representation learning approach for Domain Adaptation (ARDA) to learn high-level feature representations that are both domain-invariant and target-discriminative to tackle the cross-domain classification problem. Specifically, this approach takes advantage of the differential property of Wasserstein distance to measure distribution divergence by incorporating Wasserstein GAN. Our architecture constitutes three parts: a feature generator to generate the desired features from inputs of two domains, a critic to evaluate the Wasserstein distance based on the generated features and an adaptive classifier to accomplish the final classification task. Empirical studies on 4 common domain adaptation datasets demonstrate that our proposed ARDA outperforms the state-of-the-art domain-invariant feature learning approaches.

[1]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[2]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[3]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[4]  Fernando De la Torre,et al.  Selective Transfer Machine for Personalized Facial Action Unit Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  John Blitzer,et al.  Co-Training for Domain Adaptation , 2011, NIPS.

[6]  Hariharan Narayanan,et al.  Sample Complexity of Testing the Manifold Hypothesis , 2010, NIPS.

[7]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[8]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[9]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[10]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[11]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[12]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[13]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[14]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[15]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[16]  Yixin Chen,et al.  Automatic Feature Decomposition for Single View Co-training , 2011, ICML.

[17]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[18]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[19]  Philip S. Yu,et al.  Deep Learning of Transferable Representation for Scalable Domain Adaptation , 2016, IEEE Transactions on Knowledge and Data Engineering.

[20]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[21]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[22]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[23]  Korris Fu-Lai Chung,et al.  The l2, 1-Norm Stacked Robust Autoencoders for Domain Adaptation , 2016, AAAI.

[24]  Ludger Riischendorf The Wasserstein distance and approximation theorems , 1985 .

[25]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[26]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[28]  Philip S. Yu,et al.  Transfer Feature Learning with Joint Distribution Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[29]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[30]  Ivor W. Tsang,et al.  Domain Transfer Multiple Kernel Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.