论文信息 - Information Geometry of Wasserstein Divergence

Information Geometry of Wasserstein Divergence

There are two geometrical structures in a manifold of probability distributions. One is invariant, based on the Fisher information, and the other is based on the Wasserstein distance of optimal transportation. We propose a unified framework which connects the Wasserstein distance and the Kullback-Leibler (KL) divergence to give a new information-geometrical theory. We consider the discrete case consisting of n elements and study the geometry of the probability simplex \(S_{n-1}\), the set of all probability distributions over n atoms. The Wasserstein distance is introduced in \(S_{n-1}\) by the optimal transportation of commodities from distribution \({\varvec{p}} \in S_{n-1}\) to \({\varvec{q}} \in S_{n-1}\). We relax the optimal transportation by using entropy, introduced by Cuturi (2013) and show that the entropy-relaxed transportation plan naturally defines the exponential family and the dually flat structure of information geometry. Although the optimal cost does not define a distance function, we introduce a novel divergence function in \(S_{n-1}\), which connects the relaxed Wasserstein distance to the KL-divergence by one parameter.

Shun-ichi Amari | Ryo Karakida

[1] David Avis,et al. Ground metric learning , 2011, J. Mach. Learn. Res..

[2] Marco Cuturi,et al. Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[3] Shun-ichi Amari,et al. Information Geometry and Its Applications , 2016 .

[4] Gabriel Peyré,et al. A Smoothed Dual Approach for Variational Wasserstein Problems , 2015, SIAM J. Imaging Sci..

[5] N. N. Chent︠s︡ov. Statistical decision rules and optimal inference , 1982 .

[6] C. Villani. Topics in Optimal Transportation , 2003 .

[7] C. R. Rao,et al. Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .