On a Generalization of the Jensen–Shannon Divergence and the Jensen–Shannon Centroid

The Jensen–Shannon divergence is a renown bounded symmetrization of the Kullback–Leibler divergence which does not require probability densities to have matching supports. In this paper, we introduce a vector-skew generalization of the scalar α-Jensen–Bregman divergences and derive thereof the vector-skew α-Jensen–Shannon divergences. We prove that the vector-skew α-Jensen–Shannon divergences are f-divergences and study the properties of these novel divergences. Finally, we report an iterative algorithm to numerically compute the Jensen–Shannon-type centroids for a set of probability densities belonging to a mixture family: This includes the case of the Jensen–Shannon centroid of a set of categorical distributions or normalized histograms.

[1]  Gert R. G. Lanckriet,et al.  On the Convergence of the Concave-Convex Procedure , 2009, NIPS.

[2]  Frank Nielsen,et al.  Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[3]  Sergio Verdú,et al.  $f$ -Divergence Inequalities , 2015, IEEE Transactions on Information Theory.

[4]  Don H. Johnson,et al.  Symmetrizing the Kullback-Leibler Distance , 2001 .

[5]  M. Fréchet Les éléments aléatoires de nature quelconque dans un espace distancié , 1948 .

[6]  Frank Nielsen,et al.  Learning Mixtures by Simplifying Kernel Density Estimators , 2013 .

[7]  Igal Sason Tight bounds for symmetric divergence measures and a new inequality relating f-divergences , 2015, 2015 IEEE Information Theory Workshop (ITW).

[8]  Joshua D. Naranjo,et al.  Statistical Properties of the Population Stability Index , 2019 .

[9]  F. Opitz Information geometry and its applications , 2012, 2012 9th European Radar Conference.

[10]  Dominik Endres,et al.  A new metric for probability distributions , 2003, IEEE Transactions on Information Theory.

[11]  Igal Sason,et al.  On Data-Processing and Majorization Inequalities for f-Divergences with Applications , 2019, Entropy.

[12]  Lillian Lee,et al.  On the effectiveness of the skew divergence for statistical language analysis , 2001, AISTATS.

[13]  Murti V. Salapaka,et al.  Relationships between certain f -divergences , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[14]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[15]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[16]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[17]  Frank Nielsen,et al.  Jeffreys Centroids: A Closed-Form Expression for Positive Histograms and a Guaranteed Tight Approximation for Frequency Histograms , 2013, IEEE Signal Processing Letters.

[18]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[19]  Frank Nielsen On the symmetrical Kullback-Leibler Jeffreys centroids , 2013, ArXiv.

[20]  Frank Nielsen,et al.  On Clustering Histograms with k-Means by Using Mixed α-Divergences , 2014, Entropy.

[21]  James Stuart Tanton,et al.  Encyclopedia of Mathematics , 2005 .

[22]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[23]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[24]  Ido Dagan,et al.  Directional distributional similarity for lexical inference , 2010, Natural Language Engineering.

[25]  Takuya Yamano,et al.  Some bounds for skewed α-Jensen-Shannon divergence , 2019, Results in Applied Mathematics.

[26]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[27]  Daniel Boley,et al.  Bregman Divergences and Triangle Inequality , 2013, SDM.

[28]  Tara Javidi,et al.  Extrinsic Jensen–Shannon Divergence: Applications to Variable-Length Coding , 2013, IEEE Transactions on Information Theory.

[29]  Peng Xu,et al.  Infinity-Rényi entropy power inequalities , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[30]  Frank Nielsen,et al.  Skew Jensen-Bregman Voronoi Diagrams , 2011, Trans. Comput. Sci..

[31]  Frank Nielsen On the Jensen–Shannon Symmetrization of Distances Relying on Abstract Means , 2019, Entropy.

[32]  B. Fuglede Spirals in Hilbert space: With an application in information theory , 2005 .

[33]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[34]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[35]  Frank Nielsen,et al.  A family of statistical symmetric divergences based on Jensen's inequality , 2010, ArXiv.

[36]  Elena Deza,et al.  Encyclopedia of Distances , 2014 .

[37]  Andrew K. C. Wong,et al.  Entropy and Distance of Random Graphs with Application to Structural Pattern Recognition , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  J. Norris Appendix: probability and measure , 1997 .

[39]  Frank Nielsen,et al.  On the Jensen–Shannon Symmetrization of Distances Relying on Abstract Means , 2019, Entropy.

[40]  Frank Nielsen,et al.  Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities , 2016, Entropy.

[41]  Frank Nielsen,et al.  Total Jensen divergences: Definition, properties and clustering , 2013, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  Michèle Basseville,et al.  Divergence measures for statistical data processing - An annotated bibliography , 2013, Signal Process..

[43]  Frank Nielsen,et al.  On the Geometry of Mixtures of Prescribed Distributions , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44]  Flemming Topsøe,et al.  Basic Concepts, Identities and Inequalities - the Toolkit of Information Theory , 2001, Entropy.

[45]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[46]  J. D. Castillo,et al.  The singly truncated normal distribution: A non-steep exponential family , 1994 .

[47]  Frank Nielsen,et al.  Monte Carlo Information Geometry: The dually flat case , 2018, ArXiv.

[48]  Igal Sason,et al.  On f-Divergences: Integral Representations, Local Behavior, and Inequalities , 2018, Entropy.

[49]  Aditya Guntuboyina Lower Bounds for the Minimax Risk Using $f$-Divergences, and Applications , 2011, IEEE Transactions on Information Theory.

[50]  K Ch Chatzisavvas,et al.  Information entropy, information distances, and complexity in atoms. , 2005, The Journal of chemical physics.

[51]  Frank Nielsen,et al.  Entropies and cross-entropies of exponential families , 2010, 2010 IEEE International Conference on Image Processing.

[52]  Brigitte Bigi,et al.  Using Kullback-Leibler Distance for Text Categorization , 2003, ECIR.

[53]  Frank Nielsen,et al.  The Burbea-Rao and Bhattacharyya Centroids , 2010, IEEE Transactions on Information Theory.

[54]  Tsachy Weissman,et al.  Information Measures: The Curious Case of the Binary Alphabet , 2014, IEEE Transactions on Information Theory.