Non-parametric estimation of integral probability metrics

In this paper, we develop and analyze a nonparametric method for estimating the class of integral probability metrics (IPMs), examples of which include the Wasserstein distance, Dudley metric, and maximum mean discrepancy (MMD). We show that these distances can be estimated efficiently by solving a linear program in the case of Wasserstein distance and Dudley metric, while MMD is computable in a closed form. All these estimators are shown to be strongly consistent and their convergence rates are analyzed. Based on these results, we show that IPMs are simple to estimate and the estimators exhibit good convergence behavior compared to ø-divergence estimators.

[1]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[2]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[3]  L. Devroye,et al.  Nonparametric density estimation : the L[1] view , 1987 .

[4]  S. Rachev On a Class of Minimal Functionals on a Space of Probability Measures , 1985 .

[5]  S. Rachev The Monge–Kantorovich Mass Transference Problem and Its Stochastic Applications , 1985 .

[6]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[7]  G. Shorack Probability for Statisticians , 2000 .

[8]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[9]  Qing Wang,et al.  Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.

[10]  R. Gray,et al.  A Generalization of Ornstein's $\bar d$ Distance with Applications to Information Theory , 1975 .

[11]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[12]  Peter Harremoës,et al.  Refinements of Pinsker's inequality , 2003, IEEE Trans. Inf. Theory.

[13]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[14]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[15]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[16]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[17]  L. Devroye,et al.  Nonparametric Density Estimation: The L 1 View. , 1985 .

[18]  Ulrike von Luxburg,et al.  Distance-Based Classification with Lipschitz Functions , 2004, J. Mach. Learn. Res..

[19]  Kenji Fukumizu,et al.  On integral probability metrics, φ-divergences and binary classification , 2009, 0901.2698.

[20]  Martin J. Wainwright,et al.  Nonparametric estimation of the likelihood ratio and divergence functionals , 2007, 2007 IEEE International Symposium on Information Theory.

[21]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[22]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .