Sharp Inequalities for $f$ -Divergences

$f$-divergences are a general class of divergences between probability measures which include as special cases many commonly used divergences in probability, mathematical statistics and information theory such as Kullback-Leibler divergence, chi-squared divergence, squared Hellinger distance, total variation distance etc. In this paper, we study the problem of maximizing or minimizing an $f$-divergence between two probability measures subject to a finite number of constraints on other $f$-divergences. We show that these infinite-dimensional optimization problems can all be reduced to optimization problems over small finite dimensional spaces which are tractable. Our results lead to a comprehensive and unified treatment of the problem of obtaining sharp inequalities between $f$-divergences. We demonstrate that many of the existing results on inequalities between $f$-divergences can be obtained as special cases of our results and we also improve on some existing non-sharp inequalities.

[1]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[2]  Igor Vajda,et al.  On Pairs of $f$ -Divergences and Their Joint Range , 2010, IEEE Transactions on Information Theory.

[3]  Friedrich Liese $\phi $PHI-divergences, sufficiency, Bayes sufficiency, and deficiency , 2012 .

[4]  Imre Bárány,et al.  Notes About the Carathéodory Number , 2012, Discrete & Computational Geometry.

[5]  Aditya Guntuboyina Lower Bounds for the Minimax Risk Using $f$-Divergences, and Applications , 2011, IEEE Transactions on Information Theory.

[6]  Amiel Feinstein,et al.  Information and information stability of random variables and processes , 1964 .

[7]  Igor Vajda,et al.  On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[8]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[9]  Flemming Topsøe,et al.  Information-theoretical optimization techniques , 1979, Kybernetika.

[10]  Peter Harremoës,et al.  Refinements of Pinsker's inequality , 2003, IEEE Trans. Inf. Theory.

[11]  K. Marton Bounding $\bar{d}$-distance by informational divergence: a method to prove measure concentration , 1996 .

[12]  Katalin Marton,et al.  A simple proof of the blowing-up lemma , 1986, IEEE Trans. Inf. Theory.

[13]  Bin Yu Assouad, Fano, and Le Cam , 1997 .

[14]  Gustavo L. Gilardoni On the minimum f-divergence for given total variation , 2006 .

[15]  Josip Pečarić,et al.  A note on Jensen's inequality for 2D-convex functions , 2013 .

[16]  Friedrich Liese,et al.  φ-DIVERGENCES , SUFFICIENCY , BAYES SUFFICIENCY , AND DEFICIENCY , 2012 .

[17]  Grace L. Yang,et al.  Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics. , 1997 .

[18]  Ferdinand Österreicher,et al.  Statistical information and discrimination , 1993, IEEE Trans. Inf. Theory.

[19]  A. Barron ENTROPY AND THE CENTRAL LIMIT THEOREM , 1986 .

[20]  S. Kullback,et al.  A lower bound for discrimination information in terms of variation (Corresp.) , 1967, IEEE Trans. Inf. Theory.

[21]  Solomon Kullback,et al.  Correction to A Lower Bound for Discrimination Information in Terms of Variation , 1970, IEEE Trans. Inf. Theory.

[22]  L. L. Cam,et al.  Asymptotic Methods In Statistical Decision Theory , 1986 .

[23]  Mark D. Reid,et al.  Information, Divergence and Risk for Binary Experiments , 2009, J. Mach. Learn. Res..

[24]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[25]  R. Phelps Lectures on Choquet's Theorem , 1966 .

[26]  J. Kemperman,et al.  On the Optimum Rate of Transmitting Information , 1969 .

[27]  K. Marton A measure concentration inequality for contracting markov chains , 1996 .

[28]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[29]  Flemming Topsøe,et al.  Some inequalities for information divergence and related measures of discrimination , 2000, IEEE Trans. Inf. Theory.

[30]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[31]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[32]  Igor Vajda,et al.  Note on discrimination information and variation (Corresp.) , 1970, IEEE Trans. Inf. Theory.