Improved finite-sample estimate of a nonparametric f-divergence

Information divergence functions allow us to measure distances between probability density functions. We focus on the case where we only have data from the two distributions and have no knowledge of the underlying models from which the data is sampled. In this scenario, we consider an f-divergence for which there exists an asymptotically consistent, nonparametric estimator based on minimum spanning trees, the Dp divergence. Nonparametric estimators are known to have slow convergence rates in higher dimensions (d > 4), resulting in a large bias for small datasets. Based on experimental validation, we conjecture that the original estimator follows a power law convergence model and introduce a new estimator based on a bootstrap sampling scheme that results in a reduced bias. Experiments on real and artificial data show that the new estimator results in improved estimates of the Dp divergence when compared against the original estimator.

[1]  Amaury Lendasse,et al.  On the statistical estimation of Rényi entropies , 2009, 2009 IEEE International Workshop on Machine Learning for Signal Processing.

[2]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[3]  Alfred O. Hero,et al.  Empirically Estimable Classification Bounds Based on a New Divergence Measure , 2014, ArXiv.

[4]  Alan Wisler,et al.  Graph-based Estimation of Information Divergence Functions , 2017 .

[5]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[6]  Fernando Pérez-Cruz,et al.  Kullback-Leibler divergence estimation of continuous distributions , 2008, 2008 IEEE International Symposium on Information Theory.

[7]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[8]  A. Hero,et al.  Asymptotic Relations Between Minimal Graphs andfi-entropy , 2003 .

[9]  S. Venkatesh,et al.  Asymptotic expansions of the k nearest neighbor risk , 1998 .

[10]  M. H. Quenouille Approximate Tests of Correlation in Time‐Series , 1949 .

[11]  Carey E. Priebe,et al.  A bootstrap interval estimator for Bayes' classification error , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[12]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[13]  Qing Wang,et al.  Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.

[14]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[15]  J. D. Gorman,et al.  Alpha-Divergence for Classification, Indexing and Retrieval (Revised 2) , 2002 .

[16]  Alfred O. Hero,et al.  Empirical Non-Parametric Estimation of the Fisher Information , 2014, IEEE Signal Processing Letters.

[17]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[18]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[19]  Frans Coetzee,et al.  Correcting the Kullback-Leibler distance for feature selection , 2005, Pattern Recognit. Lett..

[20]  Neeraj Misra,et al.  Kn-nearest neighbor estimators of entropy , 2008 .

[21]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[22]  B. Park,et al.  Estimation of Kullback–Leibler Divergence by Local Likelihood , 2006 .