Joint distribution optimal transportation for domain adaptation

This paper deals with the unsupervised domain adaptation problem, where one wants to estimate a prediction function $f$ in a given target domain without any labeled sample by exploiting the knowledge available from a source domain where labels are known. Our work makes the following assumption: there exists a non-linear transformation between the joint feature/label space distributions of the two domain $\mathcal{P}_s$ and $\mathcal{P}_t$. We propose a solution of this problem with optimal transport, that allows to recover an estimated target $\mathcal{P}^f_t=(X,f(X))$ by optimizing simultaneously the optimal coupling and $f$. We show that our method corresponds to the minimization of a bound on the target error, and provide an efficient algorithmic solution, for which convergence is proved. The versatility of our approach, both in terms of class of hypothesis or loss functions is demonstrated with real world classification and regression problems, for which we reach or surpass state-of-the-art results.

[1]  Luigi Grippo,et al.  On the convergence of the block nonlinear Gauss-Seidel method under convex constraints , 2000, Oper. Res. Lett..

[2]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.

[3]  C. Villani,et al.  Quantitative Concentration Inequalities for Empirical Measures on Non-compact Spaces , 2005, math/0503123.

[4]  L. Kantorovich On the Translocation of Masses , 2006 .

[5]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[6]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[7]  C. Villani Optimal Transport: Old and New , 2008 .

[8]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[9]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[10]  Dacheng Tao,et al.  Bregman Divergence-Based Regularization for Transfer Subspace Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[11]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[12]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[13]  Qiang Yang,et al.  Cross Validation Framework to Choose amongst Models and Datasets for Transfer Learning , 2010, ECML/PKDD.

[14]  Shai Ben-David,et al.  Access to Unlabeled Data can Speed up Prediction Time , 2011, ICML.

[15]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[17]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[18]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Brian C. Lovell,et al.  Unsupervised Domain Adaptation by Domain Invariant Projection , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Ruth Urner,et al.  Domain adaptation–can quantity compensate for quality? , 2013, Annals of Mathematics and Artificial Intelligence.

[21]  Ivan Marsic,et al.  Covariate Shift in Hilbert Space: A Solution via Sorrogate Kernels , 2013, ICML.

[22]  Philip S. Yu,et al.  Adaptation Regularization: A General Framework for Transfer Learning , 2014, IEEE Transactions on Knowledge and Data Engineering.

[23]  Philip S. Yu,et al.  Transfer Joint Matching for Unsupervised Domain Adaptation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Nicolas Courty,et al.  Domain Adaptation with Regularized Optimal Transport , 2014, ECML/PKDD.

[25]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[26]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[27]  Bernhard Schölkopf,et al.  Multi-Source Domain Adaptation: A Causal View , 2015, AAAI.

[28]  F. Santambrogio Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling , 2015 .

[29]  Filippo Santambrogio,et al.  Optimal Transport for Applied Mathematicians , 2015 .

[30]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[31]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[32]  Nicolas Courty,et al.  Mapping Estimation for Discrete Optimal Transport , 2016, NIPS.

[33]  Bernhard Schölkopf,et al.  Domain Adaptation with Conditional Transferable Components , 2016, ICML.

[34]  Gustavo K. Rohde,et al.  A Transportation Lp Distance for Signal Analysis , 2016, ArXiv.

[35]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.