On Learning Parametric Non-Smooth Continuous Distributions

With the eventual goal of better understanding learning rates of general continuous distributions, we derive the first essentially min-max optimal estimators and learning rates for several natural classes of parametric non-smooth continuous distributions under KL divergence. In particular, we show that unlike the folk theorem of 1/2n learning-rate increase per distribution parameter, non-smooth distribution exhibit a wide range of learning rates.

[1]  Liam Paninski Variational Minimax Estimation of Discrete Distributions under KL Loss , 2004, NIPS.

[2]  Alon Orlitsky,et al.  On Learning Markov Chains , 2018, NeurIPS.

[3]  Ilias Diakonikolas,et al.  Sample-Optimal Density Estimation in Nearly-Linear Time , 2015, SODA.

[4]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[5]  L. Birge,et al.  On estimating a density using Hellinger distance and some other strange facts , 1986 .

[6]  Dietrich Braess,et al.  Bernstein polynomials and learning theory , 2004, J. Approx. Theory.

[7]  Alon Orlitsky,et al.  On Learning Distributions from their Samples , 2015, COLT.

[8]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[9]  Alon Orlitsky,et al.  Learning Markov distributions: Does estimation trump compression? , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[10]  P. Hall On Kullback-Leibler loss and density estimation , 1987 .

[11]  Rocco A. Servedio,et al.  Explorer Efficient Density Estimation via Piecewise Polynomial Approximation , 2013 .

[12]  J. Hartigan The maximum likelihood prior , 1998 .

[13]  Rocco A. Servedio,et al.  Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms , 2014, NIPS.

[14]  A. Barron,et al.  Information theory and superefficiency , 1998 .

[15]  Feng Liang,et al.  Improved minimax predictive densities under Kullback-Leibler loss , 2006 .

[16]  P. Bickel Minimax Estimation of the Mean of a Normal Distribution when the Parameter Space is Restricted , 1981 .