Hierarchical Optimal Transport for Unsupervised Domain Adaptation

In this paper, we propose a novel approach for unsupervised domain adaptation, that relates notions of optimal transport, learning probability measures and unsupervised learning. The proposed approach, HOT-DA, is based on a hierarchical formulation of optimal transport, that leverages beyond the geometrical information captured by the ground metric, richer structural information in the source and target domains. The additional information in the labeled source domain is formed instinctively by grouping samples into structures according to their class labels. While exploring hidden structures in the unlabeled target domain is reduced to the problem of learning probability measures through Wasserstein barycenter, which we prove to be equivalent to spectral clustering. Experiments on a toy dataset with controllable complexity and two challenging visual adaptation datasets show the superiority of the proposed approach over the state-of-the-art.

[1]  O. Gaans Probability measures on metric spaces , 2022 .

[2]  C. Ding,et al.  Spectral relaxation models and structure analysis for K-way graph clustering and bi-clustering , 2001 .

[3]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[4]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Christoph Schnörr,et al.  A Hierarchical Approach to Optimal Transport , 2013, SSVM.

[6]  Diane J. Cook,et al.  A Survey of Unsupervised Deep Domain Adaptation , 2018, ACM Trans. Intell. Syst. Technol..

[7]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[8]  Tommi S. Jaakkola,et al.  Structured Optimal Transport , 2018, AISTATS.

[9]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[10]  Carole Lartizien,et al.  Margin-aware Adversarial Domain Adaptation with Optimal Transport , 2020, ICML.

[11]  C. Villani Optimal Transport: Old and New , 2008 .

[12]  Marco Cuturi,et al.  Computational Optimal Transport: With Applications to Data Science , 2019 .

[13]  Justin Solomon,et al.  Hierarchical Optimal Transport for Document Representation , 2019, NeurIPS.

[14]  Michael Werman,et al.  Fast and robust Earth Mover's Distances , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[16]  L. Kantorovich On the Translocation of Masses , 2006 .

[17]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[19]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[20]  David Pollard,et al.  Quantization and the method of k -means , 1982, IEEE Trans. Inf. Theory.

[21]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[22]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[23]  Nasser M. Nasrabadi,et al.  Transporting Labels via Hierarchical Optimal Transport for Semi-Supervised Learning , 2020, ECCV.

[24]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[25]  Philip A. Knight,et al.  The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[26]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[27]  Ievgen Redko,et al.  Advances in Domain Adaptation Theory , 2019 .

[28]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[29]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[30]  Eva L. Dyer,et al.  Hierarchical Optimal Transport for Multimodal Distribution Alignment , 2019, NeurIPS.

[31]  Lorenzo Rosasco,et al.  Learning Probability Measures with respect to Optimal Transport Metrics , 2012, NIPS.

[32]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[33]  Nicolas Courty,et al.  Optimal Transport for Multi-source Domain Adaptation under Target Shift , 2018, AISTATS.

[34]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[37]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[38]  Arthur Cayley,et al.  The Collected Mathematical Papers: On Monge's “Mémoire sur la théorie des déblais et des remblais” , 2009 .

[39]  Wouter M. Kouw,et al.  A Review of Domain Adaptation without Target Labels , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[41]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[42]  Nicolas Courty,et al.  Joint distribution optimal transportation for domain adaptation , 2017, NIPS.

[43]  Dinh Q. Phung,et al.  Multilevel Clustering via Wasserstein Means , 2017, ICML.

[44]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[45]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[46]  Filippo Santambrogio,et al.  Optimal Transport for Applied Mathematicians , 2015 .

[47]  Philip S. Yu,et al.  Transfer Feature Learning with Joint Distribution Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[48]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[49]  Julien Rabin,et al.  Regularized Discrete Optimal Transport , 2013, SIAM J. Imaging Sci..

[50]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[51]  Sebastian Reich,et al.  A Nonparametric Ensemble Transform Method for Bayesian Inference , 2012, SIAM J. Sci. Comput..

[52]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.