A Unified Framework for Domain Adaptation using Metric Learning on Manifolds

We present a novel framework for domain adaptation, whereby both geometric and statistical differences between a labeled source domain and unlabeled target domain can be integrated by exploiting the curved Riemannian geometry of statistical manifolds. Our approach is based on formulating transfer from source to target as a problem of geometric mean metric learning on manifolds. Specifically, we exploit the curved Riemannian manifold geometry of symmetric positive definite (SPD) covariance matrices. We exploit a simple but important observation that as the space of covariance matrices is both a Riemannian space as well as a homogeneous space, the shortest path geodesic between two covariances on the manifold can be computed analytically. Statistics on the SPD matrix manifold, such as the geometric mean of two matrices can be reduced to solving the well-known Riccati equation. We show how the Ricatti-based solution can be constrained to not only reduce the statistical differences between the source and target domains, such as aligning second order covariances and minimizing the maximum mean discrepancy, but also the underlying geometry of the source and target domains using diffusions on the underlying source and target manifolds. A key strength of our proposed approach is that it enables integrating multiple sources of variation between source and target in a unified way, by reducing the combined objective function to a nested set of Ricatti equations where the solution can be represented by a cascaded series of geometric mean computations. In addition to showing the theoretical optimality of our solution, we present detailed experiments using standard transfer learning testbeds from computer vision comparing our proposed algorithms to past work in domain adaptation, showing improved results over a large variety of previous methods.

[1]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[2]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[3]  Nicolò Cesa-Bianchi Learning the Distribution in the Extended PAC Model , 1990, ALT.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Shiguang Shan,et al.  Generalized Unsupervised Manifold Alignment , 2014, NIPS.

[6]  Suvrit Sra,et al.  Geometric Mean Metric Learning , 2016, ICML.

[7]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  Han Zhao,et al.  Unsupervised Domain Adaptation with a Relaxed Covariate Shift Assumption , 2017, AAAI.

[9]  Nathan Srebro,et al.  Stochastic optimization for deep CCA via nonlinear orthogonal iterations , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[10]  Fanjiang Xu,et al.  Cross-Domain Metric Learning Based on Information Theory , 2014, AAAI.

[11]  Tinne Tuytelaars,et al.  Subspace Alignment For Domain Adaptation , 2014, ArXiv.

[12]  Brian C. Lovell,et al.  Unsupervised Domain Adaptation by Domain Invariant Projection , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Kenji Fukumizu,et al.  Statistical Convergence of Kernel CCA , 2005, NIPS.

[14]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[15]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[16]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[17]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[19]  Marc Sebban,et al.  Metric Learning , 2015, Metric Learning.

[20]  Mikhail Belkin,et al.  Convergence of Laplacian Eigenmaps , 2006, NIPS.

[21]  Margaret Lech,et al.  Evaluating deep learning architectures for Speech Emotion Recognition , 2017, Neural Networks.

[22]  Ronald R. Coifman,et al.  Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators , 2005, NIPS.

[23]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[24]  R. Bhatia Positive Definite Matrices , 2007 .

[25]  Kate Saenko,et al.  Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[26]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..