Hierarchical Optimal Transport for Multimodal Distribution Alignment

In many machine learning applications, it is necessary to meaningfully aggregate, through alignment, different but related datasets. Optimal transport (OT)-based approaches pose alignment as a divergence minimization problem: the aim is to transform a source dataset to match a target dataset using the Wasserstein distance as a divergence measure. We introduce a hierarchical formulation of OT which leverages clustered structure in data to improve alignment in noisy, ambiguous, or multimodal settings. To solve this numerically, we propose a distributed ADMM algorithm that also exploits the Sinkhorn distance, thus it has an efficient computational complexity that scales quadratically with the size of the largest cluster. When the transformation between two datasets is unitary, we provide performance guarantees that describe when and how well aligned cluster correspondences can be recovered with our formulation, as well as provide worst-case dataset geometry for such a strategy. We apply this method to synthetic datasets that model data as mixtures of low-rank Gaussians and study the impact that different geometric properties of the data have on alignment. Next, we applied our approach to a neural decoding application where the goal is to predict movement directions and instantaneous velocities from populations of neurons in the macaque primary motor cortex. Our results demonstrate that when clustered structure exists in datasets, and is consistent across trials or time points, a hierarchical alignment strategy that leverages such structure can provide significant improvements in cross-domain alignment.

[1]  Wotao Yin,et al.  Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.

[2]  Maks Ovsjanikov,et al.  Functional maps , 2012, ACM Trans. Graph..

[3]  Aswin C. Sankaranarayanan,et al.  Greedy feature selection for subspace clustering , 2013, J. Mach. Learn. Res..

[4]  Shiguang Shan,et al.  Generalized Unsupervised Manifold Alignment , 2014, NIPS.

[5]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[6]  Bruno A. Olshausen,et al.  Learning transport operators for image manifolds , 2009, NIPS.

[7]  Yonina C. Eldar,et al.  Robust Recovery of Signals From a Structured Union of Subspaces , 2008, IEEE Transactions on Information Theory.

[8]  Andriy Myronenko,et al.  Point Set Registration: Coherent Point Drift , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Justin Solomon,et al.  Hierarchical Optimal Transport for Document Representation , 2019, NeurIPS.

[10]  Kate Saenko,et al.  Subspace Distribution Alignment for Unsupervised Domain Adaptation , 2015, BMVC.

[11]  Meng Zhang,et al.  Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction , 2017, EMNLP.

[12]  Edouard Grave,et al.  Unsupervised Alignment of Embeddings with Wasserstein Procrustes , 2018, AISTATS.

[13]  Gabriel Peyré,et al.  Sample Complexity of Sinkhorn Divergences , 2018, AISTATS.

[14]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Tommi S. Jaakkola,et al.  Towards Optimal Transport with Global Invariances , 2018, AISTATS.

[16]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Dimitri P. Bertsekas,et al.  On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[18]  Sridhar Mahadevan,et al.  Manifold alignment using Procrustes analysis , 2008, ICML '08.

[19]  Anand Rangarajan,et al.  The Softassign Procrustes Matching Algorithm , 1997, IPMI.

[20]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[22]  L. Kantorovich On a Problem of Monge , 2006 .

[23]  Leonidas J. Guibas,et al.  Shape google: Geometric words and expressions for invariant shape retrieval , 2011, TOGS.

[24]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[25]  Chang Wang,et al.  A General Framework for Manifold Alignment , 2009, AAAI Fall Symposium: Manifold Learning and Its Applications.

[26]  Eva L. Dyer,et al.  A cryptography-based approach for movement decoding , 2016, Nature Biomedical Engineering.

[27]  John von Neumann,et al.  1. A Certain Zero-sum Two-person Game Equivalent to the Optimal Assignment Problem , 1953 .

[28]  Bernhard Schölkopf,et al.  Domain Adaptation with Conditional Transferable Components , 2016, ICML.

[29]  Adel Javanmard,et al.  Perturbation Bounds for Procrustes, Classical Scaling, and Trilateration, with Applications to Manifold Learning , 2018, J. Mach. Learn. Res..

[30]  Rama Chellappa,et al.  Generalized Domain-Adaptive Dictionaries , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Ron Kimmel,et al.  Generalized multidimensional scaling: A framework for isometry-invariant partial surface matching , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Philip S. Yu,et al.  Transfer Learning on Heterogenous Feature Spaces via Spectral Transformation , 2010, 2010 IEEE International Conference on Data Mining.

[33]  Philip S. Yu,et al.  Transfer Joint Matching for Unsupervised Domain Adaptation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Paul J. Besl,et al.  Method for registration of 3-D shapes , 1992, Other Conferences.

[35]  Rushil Anirudh,et al.  Multiple Subspace Alignment Improves Domain Adaptation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[37]  Christoph Schnörr,et al.  A Hierarchical Approach to Optimal Transport , 2013, SSVM.

[38]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[39]  R. Dudley The Speed of Mean Glivenko-Cantelli Convergence , 1969 .

[40]  Tommi S. Jaakkola,et al.  Structured Optimal Transport , 2018, AISTATS.

[41]  Anand Rangarajan,et al.  A new point matching algorithm for non-rigid registration , 2003, Comput. Vis. Image Underst..

[42]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Yueting Zhuang,et al.  Sparse Unsupervised Dimensionality Reduction for Multiple View Data , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[44]  Gary K. L. Tam,et al.  Registration of 3D Point Clouds and Meshes: A Survey from Rigid to Nonrigid , 2013, IEEE Transactions on Visualization and Computer Graphics.

[45]  Nicolas Courty,et al.  Joint distribution optimal transportation for domain adaptation , 2017, NIPS.

[46]  Debasmit Das,et al.  Unsupervised Domain Adaptation Using Regularized Hyper-Graph Matching , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[47]  Julien Rabin,et al.  Regularized Discrete Optimal Transport , 2013, SIAM J. Imaging Sci..

[48]  Kate Saenko,et al.  Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[49]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[50]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[51]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[52]  Eva L. Dyer,et al.  Latent Factors and Dynamics in Motor Cortex and Their Application to Brain–Machine Interfaces , 2018, The Journal of Neuroscience.

[53]  F. Bach,et al.  Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance , 2017, Bernoulli.

[54]  Brian C. Lovell,et al.  Unsupervised Domain Adaptation by Domain Invariant Projection , 2013, 2013 IEEE International Conference on Computer Vision.

[55]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.