Hilbert Sinkhorn Divergence for Optimal Transport

The Sinkhorn divergence has become a very popular metric to compare probability distributions in optimal transport. However, most works resort to the Sinkhorn divergence in Euclidean space, which greatly blocks their applications in complex data with nonlinear structure. It is therefore of theoretical demand to empower the Sinkhorn divergence with the capability of capturing nonlinear structures. We propose a theoretical and computational framework to bridge this gap. In this paper, we extend the Sinkhorn divergence in Euclidean space to the reproducing kernel Hilbert space, which we term "Hilbert Sinkhorn divergence" (HSD). In particular, we can use kernel matrices to derive a closed form expression of the HSD that is proved to be a tractable convex optimization problem. We also prove several attractive statistical properties of the proposed HSD, i.e., strong consistency, asymptotic behavior and sample complexity. Empirically, our method yields state-of-the-art performances on image classification and topological data analysis.

[1]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[2]  Jean-Luc Starck,et al.  Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning , 2017, SIAM J. Imaging Sci..

[3]  Xin Guo,et al.  Sparsemax and Relaxed Wasserstein for Topic Sparsity , 2018, WSDM.

[4]  Arye Nehorai,et al.  Optimal Transport in Reproducing Kernel Hilbert Spaces: Theory and Applications , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[6]  Gabriel Peyré,et al.  Gromov-Wasserstein Averaging of Kernel and Distance Matrices , 2016, ICML.

[7]  Jonathan Niles-Weed,et al.  Estimation of Wasserstein distances in the Spiked Transport Model , 2019, Bernoulli.

[8]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[9]  Steve Oudot,et al.  Eurographics Symposium on Geometry Processing 2015 Stable Topological Signatures for Points on 3d Shapes , 2022 .

[10]  Christophe Andrieu,et al.  Kernel Adaptive Metropolis-Hastings , 2014, ICML.

[11]  Karthikeyan Natesan Ramamurthy,et al.  A Riemannian Framework for Statistical Analysis of Topological Persistence Diagrams , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Zhichao Wang,et al.  Riemannian Submanifold Tracking on Low-Rank Algebraic Variety , 2017, AAAI.

[13]  C. Villani Optimal Transport: Old and New , 2008 .

[14]  P. Rigollet,et al.  Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming , 2019, Cell.

[15]  Kenji Fukumizu,et al.  Persistence weighted Gaussian kernel for topological data analysis , 2016, ICML.

[16]  Maks Ovsjanikov,et al.  Persistence-Based Structural Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  J. Solomon,et al.  Quantum entropic regularization of matrix-valued optimal transport , 2017, European Journal of Applied Mathematics.

[18]  Wen Li,et al.  Semi-Supervised Optimal Transport for Heterogeneous Domain Adaptation , 2018, IJCAI.

[19]  Marco Cuturi,et al.  Subspace Robust Wasserstein distances , 2019, ICML.

[20]  Steve Oudot,et al.  Sliced Wasserstein Kernel for Persistence Diagrams , 2017, ICML.

[21]  Li Guo,et al.  Lingo: Linearized Grassmannian Optimization for Nuclear Norm Minimization , 2015, CIKM.

[22]  Vladimir G. Kim,et al.  Entropic metric alignment for correspondence problems , 2016, ACM Trans. Graph..

[23]  Ulrich Bauer,et al.  Distributed Computation of Persistent Homology , 2014, ALENEX.

[24]  Jung Hun Oh,et al.  A novel kernel Wasserstein distance on Gaussian measures: An application of identifying dental artifacts in head and neck computed tomography , 2020, Comput. Biol. Medicine.

[25]  Filippo Santambrogio,et al.  Optimal Transport for Applied Mathematicians , 2015 .

[26]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[27]  Leonidas J. Guibas,et al.  A concise and provably informative multi-scale signature based on heat diffusion , 2009 .

[28]  Julien Rabin,et al.  Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.

[29]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[30]  Marco Cuturi,et al.  Computational Optimal Transport: With Applications to Data Science , 2019 .

[31]  Zhenhua Guo,et al.  A Completed Modeling of Local Binary Pattern Operator for Texture Classification , 2010, IEEE Transactions on Image Processing.

[32]  Victor Solo,et al.  Particle Filtering on the Stiefel Manifold with Optimal Transport , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[33]  Alessandro Rudi,et al.  Massively scalable Sinkhorn distances via the Nyström method , 2018, NeurIPS.

[34]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[35]  Matti Pietikäinen,et al.  Outex - new framework for empirical evaluation of texture analysis algorithms , 2002, Object recognition supported by user interaction for service robots.

[36]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[37]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Yang Zou,et al.  Sliced Wasserstein Kernels for Probability Distributions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Luc Van Gool,et al.  Sliced Wasserstein Generative Models , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  David A. Forsyth,et al.  Max-Sliced Wasserstein Distance and Its Use for GANs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Vivien Seguy,et al.  Smooth and Sparse Optimal Transport , 2017, AISTATS.

[42]  Alexander G. Schwing,et al.  Generative Modeling Using the Sliced Wasserstein Distance , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Gabriel Peyré,et al.  A Smoothed Dual Approach for Variational Wasserstein Problems , 2015, SIAM J. Imaging Sci..

[44]  Karthikeyan Natesan Ramamurthy,et al.  Perturbation Robust Representations of Topological Persistence Diagrams , 2018, ECCV.

[45]  Tommi S. Jaakkola,et al.  Gromov-Wasserstein Alignment of Word Embedding Spaces , 2018, EMNLP.

[46]  Ulrich Bauer,et al.  A stable multi-scale kernel for topological machine learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  C. Villani Topics in Optimal Transportation , 2003 .

[48]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Nicolas Papadakis,et al.  Regularized Optimal Transport and the Rot Mover's Distance , 2016, J. Mach. Learn. Res..

[50]  Stefanie Jegelka,et al.  Learning Generative Models across Incomparable Spaces , 2019, ICML.

[51]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[52]  Victor Solo,et al.  Lie Group State Estimation via Optimal Transport , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[53]  Bo Li,et al.  Shape Retrieval of Non-rigid 3D Human Models , 2014, International Journal of Computer Vision.

[54]  Nicolas Courty,et al.  Sliced Gromov-Wasserstein , 2019, NeurIPS.

[55]  Ding-Xuan Zhou,et al.  Capacity of reproducing kernel spaces in learning theory , 2003, IEEE Transactions on Information Theory.

[56]  Aaron Hertzmann,et al.  Learning 3D mesh segmentation and labeling , 2010, ACM Trans. Graph..

[57]  Guandong Xu,et al.  Polynomial Representation for Persistence Diagram , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Roland Badeau,et al.  Generalized Sliced Wasserstein Distances , 2019, NeurIPS.