In information geometry, a strictly convex and smooth function induces a dually flat Hessian manifold equipped with a pair of dual Bregman divergences, hereby termed a Bregman manifold. Two common types of such Bregman manifolds met in statistics are (1) the exponential family manifolds induced by the cumulant functions of regular exponential families, and (2) the mixture family manifolds induced by the Shannon negentropies of statistical mixture families with prescribed linearly independent mixture components. However, the differential entropy of a mixture of continuous probability densities sharing the same support is hitherto not known in closed form making implementation of mixture family manifolds in practice difficult. In this work, we report an exception: The family of mixtures of two prescribed and distinct Cauchy distributions. We exemplify the explicit construction of a dually flat manifold induced by the differential negentropy for this very particular setting. This construction allows one to use the geometric toolbox of Bregman algorithms, and to obtain closed-form formula (albeit being large) for the Kullback-Leibler divergence and the Jensen-Shannon divergence between two mixtures of two prescribed Cauchy components. 1 A quick review of the construction of Bregman manifolds A dually flat space [16, 1] (also called a Bregman manifold in [11]) can be built from any strictly convex and smooth function F (θ) (with open convex domain dom(F ) = Θ 6= ∅) of Legendretype [15]. The Legendre-Fenchel transformation of (Θ, F (θ)) yields a dual Legendre-type potential function (H,F ∗(η)) (with open convex domain dom(F ∗) = H) where F ∗(η) := sup θ∈Θ {θ>η − F (θ)}. The Legendre-Fenchel transformation on Legendre-type functions is involutive (i.e., (F ∗)∗ = F by the Fenchel-Moreau theorem) and induces two dual coordinate systems: η(θ) = ∇θF (θ), and θ(η) = ∇ηF ∗(η). Thus the gradients are inverse functions of each other: ∇F ∗ = (∇F )−1 and ∇F = (∇F ∗)−1. The Bregman manifold is equipped with a divergence: BF (θ1 : θ2) := F (θ1)− F (θ2)− (θ1 − θ2)∇F (θ2), 1 ar X iv :2 10 4. 13 80 1v 1 [ cs .I T ] 2 8 A pr 2 02 1 for the Bregman generator F (θ), and we have BF (θ1 : θ2) = BF ∗(η2 : η1). We can also express equivalently the dual Bregman divergences using mixed parameterizations with the Young-Fenchel divergences [4, 11]: YF (θ1 : η2) := F (θ1) + F (η2)− θ> 1 η2. We have: BF (θ1 : θ2) = YF (θ1 : η2) = YF ∗(η2 : θ1) = BF ∗(η2 : η1). A Riemannian Hessian metric tensor [16] F g can be defined in the θ-coordinate system by [ g]θ := ∇θF (θ), with dual metric tensor F ∗ g expressed in the η-coordinate system by [ ∗ g]η := ∇ηF ∗(η). Let θ = (θ1, . . . , θD) denote the contravariant coordinates and η = (η1, . . . , ηD) its equivalent covariant coordinates. Let ∂i := ∂ ∂θi define the primal natural basis E := {ei = ∂i}, and let ∂i := ∂ ∂ηi define the dual natural basis E ∗ := {e∗i = ∂i}. We have F g(ei, ej) = ∂i∂jF (θ), F g(e∗i, e∗) = ∂∂F ∗(η). The Crouzeix identity [7] holds (i.e., ∇θF (θ)∇ηF ∗(η) = I, the identity matrix) meaning that the basis E and E∗ are reciprocal [1, 10]: g(ei, e ∗j) = δ i , where δ j i is the Krönecker symbol: δ j i = 0 if j 6= i, and δ i = 1 iff. i = j. A Bregman manifold has been called a dually flat space in information geometry [1, 10] because the dual potential functions F (θ) and F ∗(η) induce two affine connections F∇ and F ∇ which are flat because their corresponding Riemann-Christoffel symbols Γij characterizing F∇ vanish in the θ-coordinate system (i.e., Γij(θ) = 0 and θ(·) is called a F∇-coordinate system) and the RiemannChristoffel symbols F ∗ Γij characterizing F ∇ vanish in the η-coordinate system (i.e., F ∗Γij(η) = 0 and η(·) is called a F ∇-coordinate system). Furthermore, the two (torsion free) affine connections F∇ and F ∇ are dual with respect to the metric tensor [1, 10] F g so that we have the mid-connection which coincides with the Levi-Civita metric connection: F∇+ F ∇ 2 = ∇, where LC∇ denote the Levi-Civita connection induced by the Hessian metric F g. Two common examples of dually flat spaces of statistical models are the exponential family manifolds [1, 10] built from regular exponential families [3] by setting the Bregman generators to the cumulant functions of the family, and the mixture family manifolds induced by the negentropy of a statistical mixture with prescribed linearly independent component distributions [1, 10]. The family of categorical distributions (also called multinoulli distributions) are both an exponential family and a mixture family. It is interesting to notice that the cumulant functions of regular exponential families are always analytic (Cω, see [3]), i.e., F (θ) admitting locally a converging Taylor series at any θ ∈ Θ) but the negentropy of a mixture may not be analytic (e.g., negentropy of a mixture of two normal distributions [17]). To use the toolbox of geometric algorithms on Bregman manifolds (e.g., [2, 5]), one needs the generators F and F ∗ and their gradient ∇F and ∇F ∗ in closed-form. This may not always be possible [12] either:
[1]
Zhenning Zhang,et al.
Information geometry of the power inverse Gaussian distribution
,
2007
.
[2]
Richard Nock,et al.
On Bregman Voronoi diagrams
,
2007,
SODA '07.
[3]
Giovanni Pistone,et al.
Information Geometry of the Gaussian Distribution in View of Stochastic Optimization
,
2015,
FOGA.
[4]
F. Nielsen.
On geodesic triangles with right angles in a dually flat space
,
2019,
Signals and Communication Technology.
[5]
Frank Nielsen,et al.
An Elementary Introduction to Information Geometry
,
2018,
Entropy.
[6]
Jean-Pierre Crouzeix,et al.
A relationship between the second derivatives of a convex function and of its conjugate
,
1977,
Math. Program..
[7]
Frank Nielsen,et al.
On $f$-divergences between Cauchy distributions
,
2021,
GSI.
[8]
Frank Nielsen,et al.
On the Geometry of Mixtures of Prescribed Distributions
,
2018,
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9]
Keisuke Yamazaki,et al.
Kullback Information of Normal Mixture is not an Analytic Function
,
2004
.
[10]
Inderjit S. Dhillon,et al.
Clustering with Bregman Divergences
,
2005,
J. Mach. Learn. Res..
[11]
Jianhua Lin,et al.
Divergence measures based on the Shannon entropy
,
1991,
IEEE Trans. Inf. Theory.
[12]
O. Barndorff-Nielsen.
Information and Exponential Families in Statistical Theory
,
1980
.
[13]
Hirohiko Shima,et al.
Geometry of Hessian Structures
,
2013,
GSI.
[14]
R. Rockafellar.
Conjugates and Legendre Transforms of Convex Functions
,
1967,
Canadian Journal of Mathematics.
[15]
Huafei Sun,et al.
THE GEOMETRY OF THE DIRICHLET MANIFOLD
,
2008
.
[16]
André F. T. Martins,et al.
Learning with Fenchel-Young Losses
,
2020,
J. Mach. Learn. Res..
[17]
Frank Nielsen,et al.
Monte Carlo Information-Geometric Structures
,
2018,
Geometric Structures of Information.
[18]
Frank Nielsen,et al.
A closed-form formula for the Kullback-Leibler divergence between Cauchy distributions
,
2019,
ArXiv.
[19]
F. Opitz.
Information geometry and its applications
,
2012,
2012 9th European Radar Conference.