Alpha/Beta Divergences and Tweedie Models

We describe the underlying probabilistic interpretation of alpha and beta divergences. We first show that beta divergences are inherently tied to Tweedie distributions, a particular type of exponential family, known as exponential dispersion models. Starting from the variance function of a Tweedie model, we outline how to get alpha and beta divergences as special cases of Csisz\'ar's $f$ and Bregman divergences. This result directly generalizes the well-known relationship between the Gaussian distribution and least squares estimation to Tweedie models and beta divergence minimization.

[1]  J. Lafferty Additive models, boosting, and inference for generalized divergences , 1999, COLT '99.

[2]  Igor Vajda,et al.  Extensions of the parametric families of divergences used in statistical inference , 2008, Kybernetika.

[3]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[4]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[5]  Mihoko Minami,et al.  Robust Blind Source Separation by Beta Divergence , 2002, Neural Computation.

[6]  Shaul K. Bar-Lev,et al.  Reproducibility and natural exponential families with power variance functions , 1986 .

[7]  Mark D. Reid,et al.  Information, Divergence and Risk for Binary Experiments , 2009, J. Mach. Learn. Res..

[8]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[9]  M. C. K. Tweedie,et al.  Functions of a statistical variate with given means, with special reference to Laplacian distributions , 1947, Mathematical Proceedings of the Cambridge Philosophical Society.

[10]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[11]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[12]  Jun Zhang,et al.  Divergence Function, Duality, and Convex Analysis , 2004, Neural Computation.

[13]  C. Morris Natural Exponential Families with Quadratic Variance Functions , 1982 .

[14]  Ananda Sen,et al.  The Theory of Dispersion Models , 1997, Technometrics.

[15]  R. Dennis Cook,et al.  Generalized Linear Models , 2008 .

[16]  Sergio Cruces,et al.  Generalized Alpha-Beta Divergences and Their Application to Robust Nonnegative Matrix Factorization , 2011, Entropy.

[17]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[18]  M. C. Jones,et al.  Robust and efficient estimation by minimising a density power divergence , 1998 .

[19]  Bent Jørgensen,et al.  Exponential Dispersion Models and Extensions: A Review , 1992 .

[20]  M. L. Menendez Shannon's entropy in exponential families: Statistical applications , 2000, Appl. Math. Lett..

[21]  A. Gardner Methods of Statistics , 1941 .

[22]  Gordon K. Smyth,et al.  Series evaluation of Tweedie exponential dispersion model densities , 2005, Stat. Comput..

[23]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[24]  Adrian S. Lewis,et al.  Moment-Matching and Best Entropy Estimation , 1994 .