On Hölder Projective Divergences

We describe a framework to build distances by measuring the tightness of inequalities and introduce the notion of proper statistical divergences and improper pseudo-divergences. We then consider the Holder ordinary and reverse inequalities and present two novel classes of Holder divergences and pseudo-divergences that both encapsulate the special case of the Cauchy–Schwarz divergence. We report closed-form formulas for those statistical dissimilarities when considering distributions belonging to the same exponential family provided that the natural parameter space is a cone (e.g., multivariate Gaussians) or affine (e.g., categorical distributions). Those new classes of Holder distances are invariant to rescaling and thus do not require distributions to be normalized. Finally, we show how to compute statistical Holder centroids with respect to those divergences and carry out center-based clustering toy experiments on a set of Gaussian distributions which demonstrate empirically that symmetrized Holder divergences outperform the symmetric Cauchy–Schwarz divergence.

[1]  Frank Nielsen,et al.  Statistical exponential families: A digest with flash cards , 2009, ArXiv.

[2]  Charles Casimiro Cavalcante,et al.  Geometry Induced by a Generalization of Rényi Divergence , 2016, Entropy.

[3]  Eric Goubault,et al.  A scalable algebraic method to infer quadratic invariants of switched systems , 2015, 2015 International Conference on Embedded Software (EMSOFT).

[4]  Frank Nielsen,et al.  Total Jensen divergences: Definition, properties and clustering , 2013, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Frank Nielsen,et al.  Patch Matching with Polynomial Exponential Families and Projective Divergences , 2016, SISAP.

[6]  Jun Zhang,et al.  Reference duality and representation duality in information geometry , 2015 .

[7]  J. A. Díaz-García,et al.  On Wishart distribution , 2010, 1010.1799.

[8]  D. S. Mitrinovic,et al.  Classical and New Inequalities in Analysis , 1992 .

[9]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[10]  Katarzyna Musial,et al.  On Accuracy of PDF Divergence Estimators and Their Applicability to Representative Data Sampling , 2011, Entropy.

[11]  Ke Sun,et al.  An Information Geometry of Statistical Manifold Learning , 2014, ICML.

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  Frank Nielsen,et al.  The Burbea-Rao and Bhattacharyya Centroids , 2010, IEEE Transactions on Information Theory.

[14]  Frank Nielsen,et al.  On the chi square and higher-order chi distances for approximating f-divergences , 2013, IEEE Signal Processing Letters.

[15]  S. Eguchi,et al.  Robust parameter estimation with a small bias against heavy contamination , 2008 .

[16]  Leila Belmerhnia,et al.  Texture retrieval using mixtures of generalized Gaussian distribution and Cauchy-Schwarz divergence in wavelet domain , 2016, Signal Process. Image Commun..

[17]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[18]  C. R. Rao,et al.  On the convexity of some divergence measures based on entropy functions , 1982, IEEE Trans. Inf. Theory.

[19]  Inderjit S. Dhillon,et al.  Differential Entropic Clustering of Multivariate Gaussians , 2006, NIPS.

[20]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[21]  Jun Zhang,et al.  Nonparametric Information Geometry: From Divergence Function to Referential-Representational Biduality on Statistical Manifolds , 2013, Entropy.

[22]  Takafumi Kanamori,et al.  Scale-Invariant Divergences for Density Functions , 2014, Entropy.

[23]  Miguel Á. Carreira-Perpiñán,et al.  The Elastic Embedding Algorithm for Dimensionality Reduction , 2010, ICML.

[24]  Frank Nielsen,et al.  Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities , 2016, Entropy.

[25]  Frank Nielsen,et al.  Skew Jensen-Bregman Voronoi Diagrams , 2011, Trans. Comput. Sci..

[26]  Frank Nielsen,et al.  Closed-form information-theoretic divergences for statistical mixtures , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[27]  Cédric Févotte,et al.  Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Frank Nielsen,et al.  Clustering Multivariate Normal Distributions , 2009, ETVC.

[29]  Frank Nielsen,et al.  A closed-form expression for the Sharma–Mittal entropy of exponential families , 2011, ArXiv.

[30]  Thomas Villmann,et al.  Divergence-Based Vector Quantization , 2011, Neural Computation.

[31]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[32]  Solomon Kullback,et al.  Bernoulli Distribution , 1935, The SAGE Encyclopedia of Research Design.

[33]  L. Pardo Statistical Inference Based on Divergence Measures , 2005 .

[34]  Narendra Ahuja,et al.  Saliency detection via divergence analysis: A unified perspective , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[35]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[36]  James Stuart Tanton,et al.  Encyclopedia of Mathematics , 2005 .

[37]  Takafumi Kanamori,et al.  Affine invariant divergences associated with proper composite scoring rules and their applications , 2014 .

[38]  A. Basu,et al.  Statistical Inference: The Minimum Distance Approach , 2011 .

[39]  Jun Zhang,et al.  Divergence Function, Duality, and Convex Analysis , 2004, Neural Computation.

[40]  Guangsheng Chen,et al.  Generalizations of Hölder inequalities for Csiszar’s f-divergence , 2013 .

[41]  Erion Hasanbelliu,et al.  Information Theoretic Shape Matching , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  W. Cheung,et al.  Generalizations of Hölder's inequality , 2001 .

[43]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[44]  Yiming Yang,et al.  Von Mises-Fisher Clustering Models , 2014, ICML.

[45]  Shinto Eguchi,et al.  Spontaneous Clustering via Minimum Gamma-Divergence , 2014, Neural Computation.

[46]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[47]  Jean-Claude Junqua,et al.  An optimal Bhattacharyya centroid algorithm for Gaussian clustering with applications in automatic speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[48]  F. Opitz Information geometry and its applications , 2012, 2012 9th European Radar Conference.

[49]  Thomas Villmann,et al.  Stochastic neighbor embedding (SNE) for dimension reduction and visualization using arbitrary divergences , 2012, Neurocomputing.

[50]  Anuj Srivastava,et al.  Riemannian Analysis of Probability Density Functions with Applications in Vision , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.