Learning Wasserstein Embeddings

The Wasserstein distance received a lot of attention recently in the community of machine learning, especially for its principled way of comparing distributions. It has found numerous applications in several hard problems, such as domain adaptation, dimensionality reduction or generative models. However, its use is still limited by a heavy computational cost. Our goal is to alleviate this problem by providing an approximation mechanism that allows to break its inherent complexity. It relies on the search of an embedding where the Euclidean distance mimics the Wasserstein distance. We show that such an embedding can be found with a siamese architecture associated with a decoder network that allows to move from the embedding space back to the original input space. Once this embedding has been found, computing optimization problems in the Wasserstein space (e.g. barycenters, principal directions or even archetypes) can be conducted extremely fast. Numerical experiments supporting this idea are conducted on image datasets, and show the wide potential benefits of our method.

[1]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transportation , 2013, NIPS 2013.

[2]  Fuzhen Zhuang,et al.  Embedding with Autoencoder Regularization , 2013, ECML/PKDD.

[3]  Yang Zou,et al.  Sliced Wasserstein Kernels for Probability Distributions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Guillaume Carlier,et al.  Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..

[6]  Gustavo K. Rohde,et al.  The Radon Cumulative Distribution Transform and Its Application to Image Classification , 2015, IEEE Transactions on Image Processing.

[7]  Subhash Khot,et al.  Nonembeddability theorems via Fourier analysis , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[8]  Gustavo K. Rohde,et al.  A Linear Optimal Transportation Framework for Quantifying and Visualizing Variations in Sets of Images , 2012, International Journal of Computer Vision.

[9]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[10]  Esteban G. Tabak,et al.  Statistical Archetypal Analysis , 2017 .

[11]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[12]  L. Rifford Introduction to Optimal Transport , 2014 .

[13]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[14]  Filippo Santambrogio,et al.  Introduction to optimal transport theory , 2010, Optimal Transport.

[15]  M. V. D. Panne,et al.  Displacement Interpolation Using Lagrangian Mass Transport , 2011 .

[16]  Gabriel Peyré,et al.  Wasserstein barycentric coordinates , 2016, ACM Trans. Graph..

[17]  Matt J. Kusner,et al.  Supervised Word Mover's Distance , 2016, NIPS.

[18]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[19]  Gabriel Peyré,et al.  Fast Dictionary Learning with a Smoothed Wasserstein Loss , 2016, AISTATS.

[20]  Gustavo K. Rohde,et al.  A continuous linear optimal transport approach for pattern analysis in image datasets , 2016, Pattern Recognit..

[21]  C. Villani Optimal Transport: Old and New , 2008 .

[22]  Julien Rabin,et al.  Sliced and Radon Wasserstein Barycenters of Measures , 2014, Journal of Mathematical Imaging and Vision.

[23]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[25]  David W. Jacobs,et al.  Approximate earth mover’s distance in linear time , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Justin Solomon,et al.  Parallel Streaming Wasserstein Barycenters , 2017, NIPS.

[27]  Stéphane Canu,et al.  Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming , 2009, IEEE Transactions on Signal Processing.

[28]  Jérémie Bigot,et al.  Geodesic PCA in the Wasserstein space by Convex PCA , 2017 .

[29]  Gustavo K. Rohde,et al.  Optimal Mass Transport: Signal processing and machine-learning applications , 2017, IEEE Signal Processing Magazine.

[30]  P. Thomas Fletcher,et al.  Principal geodesic analysis for the study of nonlinear statistics of shape , 2004, IEEE Transactions on Medical Imaging.

[31]  Gabriel Peyré,et al.  A Smoothed Dual Approach for Variational Wasserstein Problems , 2015, SIAM J. Imaging Sci..

[32]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[33]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[34]  Mathieu Desbrun,et al.  Blue noise through optimal transport , 2012, ACM Trans. Graph..

[35]  E. Tabak,et al.  Prototypal Analysis and Prototypal Regression , 2017, 1701.08916.

[36]  Marco Cuturi,et al.  Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric , 2015, NIPS.

[37]  Alexandr Andoni,et al.  Impossibility of Sketching of the 3D Transportation Metric with Quadratic Cost , 2016, ICALP.

[38]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[39]  J. Matousek,et al.  Open problems on embeddings of finite metric spaces , 2014 .