Scale-Invariant Divergences for Density Functions

Divergence is a discrepancy measure between two objects, such as functions, vectors, matrices, and so forth. In particular, divergences defined on probability distributions are widely employed in probabilistic forecasting. As the dissimilarity measure, the divergence should satisfy some conditions. In this paper, we consider two conditions: The first one is the scale-invariance property and the second is that the divergence is approximated by the sample mean of a loss function. The first requirement is an important feature for dissimilarity measures. The divergence will depend on which system of measurements we used to measure the objects. Scale-invariant divergence is transformed in a consistent way when the system of measurements is changed to the other one. The second requirement is formalized such that the divergence is expressed by using the so-called composite score. We study the relation between composite scores and scale-invariant divergences, and we propose a new class of divergences called H¨older divergence that satisfies two conditions above. We present some theoretical properties of H¨older divergence. We show that H¨older divergence unifies existing divergences from the viewpoint of scale-invariance.

[1]  M. C. Jones,et al.  Robust and efficient estimation by minimising a density power divergence , 1998 .

[2]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[3]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[4]  A. Hendrickson,et al.  Proper Scores for Probability Forecasters , 1971 .

[5]  Ali Taylan Cemgil,et al.  Nonnegative matrix factorizations as probabilistic inference in composite models , 2009, 2009 17th European Signal Processing Conference.

[6]  Shun-ichi Amari,et al.  $\alpha$ -Divergence Is Unique, Belonging to Both $f$-Divergence and Bregman Divergence Classes , 2009, IEEE Transactions on Information Theory.

[7]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[8]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[9]  Affine Invariant Divergences associated with Composite Scores and its Applications , 2013, 1305.2473.

[10]  J. Borwein,et al.  Techniques of variational analysis , 2005 .

[11]  Nobuaki Minematsu,et al.  A Study on Invariance of $f$-Divergence and Its Application to Speech Recognition , 2010, IEEE Transactions on Signal Processing.

[12]  M. C. Jones,et al.  A Comparison of related density-based minimum divergence estimators , 2001 .

[13]  S. Lauritzen,et al.  Proper local scoring rules , 2011, 1101.5011.

[14]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[15]  Takafumi Kanamori,et al.  Information Geometry of U-Boost and Bregman Divergence , 2004, Neural Computation.

[16]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[17]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[18]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[19]  S. Eguchi,et al.  Robust parameter estimation with a small bias against heavy contamination , 2008 .

[20]  D. Duffie,et al.  An Overview of Value at Risk , 1997 .

[21]  John Bjørnar Bremnes,et al.  Probabilistic Forecasts of Precipitation in Terms of Quantiles Using NWP Model Output , 2004 .

[22]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[23]  Takafumi Kanamori,et al.  Affine invariant divergences associated with proper composite scoring rules and their applications , 2014 .

[24]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[25]  Steffen Lauritzen,et al.  PROPER LOCAL SCORING RULES ON DISCRETE SAMPLE SPACES , 2011, 1104.2224.

[26]  Igor Vajda,et al.  About distances of discrete distributions satisfying the data processing theorem of information theory , 1997, IEEE Trans. Inf. Theory.