Principal whitened gradient for information geometry

We propose two strategies to improve the optimization in information geometry. First, a local Euclidean embedding is identified by whitening the tangent space, which leads to an additive parameter update sequence that approximates the geodesic flow to the optimal density model. Second, removal of the minor components of gradients enhances the estimation of the Fisher information matrix and reduces the computational cost. We also prove that dimensionality reduction is necessary for learning multidimensional linear transformations. The optimization based on the principal whitened gradients demonstrates faster and more robust convergence in simulations on unsupervised learning with synthetic data and on discriminant analysis of breast cancer data.