There are two geometrical structures in a manifold of probability distributions. One is invariant, based on the Fisher information, and the other is based on the Wasserstein distance of optimal transportation. We propose a unified framework which connects the Wasserstein distance and the Kullback-Leibler (KL) divergence to give a new information-geometrical theory. We consider the discrete case consisting of n elements and study the geometry of the probability simplex \(S_{n-1}\), the set of all probability distributions over n atoms. The Wasserstein distance is introduced in \(S_{n-1}\) by the optimal transportation of commodities from distribution \({\varvec{p}} \in S_{n-1}\) to \({\varvec{q}} \in S_{n-1}\). We relax the optimal transportation by using entropy, introduced by Cuturi (2013) and show that the entropy-relaxed transportation plan naturally defines the exponential family and the dually flat structure of information geometry. Although the optimal cost does not define a distance function, we introduce a novel divergence function in \(S_{n-1}\), which connects the relaxed Wasserstein distance to the KL-divergence by one parameter.
[1]
David Avis,et al.
Ground metric learning
,
2011,
J. Mach. Learn. Res..
[2]
Marco Cuturi,et al.
Sinkhorn Distances: Lightspeed Computation of Optimal Transport
,
2013,
NIPS.
[3]
Shun-ichi Amari,et al.
Information Geometry and Its Applications
,
2016
.
[4]
Gabriel Peyré,et al.
A Smoothed Dual Approach for Variational Wasserstein Problems
,
2015,
SIAM J. Imaging Sci..
[5]
N. N. Chent︠s︡ov.
Statistical decision rules and optimal inference
,
1982
.
[6]
C. Villani.
Topics in Optimal Transportation
,
2003
.
[7]
C. R. Rao,et al.
Information and the Accuracy Attainable in the Estimation of Statistical Parameters
,
1992
.