Adaptive optimal transport

An adaptive, adversarial methodology is developed for the optimal transport problem between two distributions $\mu $ and $\nu $, known only through a finite set of independent samples $(x_i)_{i=1..n}$ and $(y_j)_{j=1..m}$. The methodology automatically creates features that adapt to the data, thus avoiding reliance on a priori knowledge of the distributions underlying the data. Specifically, instead of a discrete point-by-point assignment, the new procedure seeks an optimal map $T(x)$ defined for all $x$, minimizing the Kullback–Leibler divergence between $(T(x_i))$ and the target $(y_j)$. The relative entropy is given a sample-based, variational characterization, thereby creating an adversarial setting: as one player seeks to push forward one distribution to the other, the second player develops features that focus on those areas where the two distributions fail to match. The procedure solves local problems that seek the optimal transfer between consecutive, intermediate distributions between $\mu $ and $\nu $. As a result, maps of arbitrary complexity can be built by composing the simple maps used for each local problem. Displaced interpolation is used to guarantee global from local optimality. The procedure is illustrated through synthetic examples in one and two dimensions.

[1]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[2]  Esteban G. Tabak,et al.  Conditional expectation estimation through attributable components , 2018 .

[3]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[4]  Takafumi Kanamori,et al.  Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation , 2008, FSDM.

[5]  Arthur Cayley,et al.  The Collected Mathematical Papers: On Monge's “Mémoire sur la théorie des déblais et des remblais” , 2009 .

[6]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[7]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[8]  Hossein Mobahi,et al.  Learning with a Wasserstein Loss , 2015, NIPS.

[9]  Mark Minasi The minimax algorithm , 1989 .

[10]  M. C. Jones,et al.  A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[11]  Esteban G. Tabak,et al.  Sample‐Based Optimal Transport and Barycenter Problems , 2019, Communications on Pure and Applied Mathematics.

[12]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[13]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[14]  R. McCann A Convexity Principle for Interacting Gases , 1997 .

[15]  Christian L'eonard,et al.  O C ] 1 1 N ov 2 01 0 FROM THE SCHRÖDINGER PROBLEM TO THE MONGE-KANTOROVICH , 2010 .

[16]  Tryphon T. Georgiou,et al.  On the Relation Between Optimal Transport and Schrödinger Bridges: A Stochastic Control Viewpoint , 2014, J. Optim. Theory Appl..

[17]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[18]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[19]  C. Villani Topics in Optimal Transportation , 2003 .

[20]  S. Varadhan,et al.  Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[21]  Esteban G. Tabak,et al.  Explanation of Variability and Removal of Confounding Factors from Data through Optimal Transport , 2018 .

[22]  M. Pavon A variational derivation of a class of BFGS-like methods , 2017, Optimization.

[23]  L. Kantorovich On the Translocation of Masses , 2006 .