Metric Learning in Optimal Transport for Domain Adaptation

Domain Adaptation aims at benefiting from a labeled dataset drawn from a source distribution to learn a model from examples generated according to a different but related target distribution. Creating a domain-invariant representation between the two source and target domains is the most widely technique used. A simple and robust way to perform this task consists in (i) representing the two domains by subspaces described by their respective eigenvectors and (ii) seeking a mapping function which aligns them. In this paper, we propose to use Optimal Transport (OT) and its associated Wasserstein distance to perform this alignment. While the idea of using OT in domain adaptation is not new, the original contribution of this paper is two-fold: (i) we derive a generalization bound on the target error involving several Wasserstein distances. This prompts us to optimize the ground metric of OT to reduce the target risk. (ii) From this theoretical analysis, we design an algorithm (MLOT) which optimizes a Mahalanobis distance leading to a transportation plan that adapts better. Experiments demonstrate the effectiveness of this original approach.

[1]  C. Villani,et al.  Quantitative Concentration Inequalities for Empirical Measures on Non-compact Spaces , 2005, math/0503123.

[2]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[3]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[4]  C. Villani Optimal Transport: Old and New , 2008 .

[5]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[6]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[7]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[8]  Qiang Yang,et al.  Cross Validation Framework to Choose amongst Models and Datasets for Transfer Learning , 2010, ECML/PKDD.

[9]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[10]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[11]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[13]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[15]  David Avis,et al.  Ground metric learning , 2011, J. Mach. Learn. Res..

[16]  Marc Sebban,et al.  Metric Learning , 2015, Metric Learning.

[17]  Kate Saenko,et al.  Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[18]  Kate Saenko,et al.  Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[19]  Ievgen Redko,et al.  Theoretical Analysis of Domain Adaptation with Optimal Transport , 2016, ECML/PKDD.

[20]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Nicolas Courty,et al.  Joint distribution optimal transportation for domain adaptation , 2017, NIPS.

[22]  Philip S. Yu,et al.  Visual Domain Adaptation with Manifold Embedded Distribution Alignment , 2018, ACM Multimedia.

[23]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[24]  Jian Shen,et al.  Wasserstein Distance Guided Representation Learning for Domain Adaptation , 2017, AAAI.

[25]  Nicolas Courty,et al.  DeepJDOT: Deep Joint distribution optimal transport for unsupervised domain adaptation , 2018, ECCV.

[26]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[27]  Marco Cuturi,et al.  Subspace Robust Wasserstein distances , 2019, ICML.

[28]  David A. Forsyth,et al.  Max-Sliced Wasserstein Distance and Its Use for GANs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Ievgen Redko,et al.  Advances in Domain Adaptation Theory , 2019 .