Direct Density-Derivative Estimation and Its Application in KL-Divergence Approximation

Estimation of density derivatives is a versatile tool in statistical data analysis. A naive approach is to first estimate the density and then compute its derivative. However, such a two-step approach does not work well because a good density estimator does not necessarily mean a good density-derivative estimator. In this paper, we give a direct method to approximate the density derivative without estimating the density itself. Our proposed estimator allows analytic and computationally efficient approximation of multi-dimensional high-order density derivatives, with the ability that all hyper-parameters can be chosen objectively by cross-validation. We further show that the proposed density-derivative estimator is useful in improving the accuracy of non-parametric KL-divergence estimation via metric learning. The practical superiority of the proposed method is experimentally demonstrated in change detection and feature selection.

[1]  E. F. Schuster Estimation of a Probability Density Function and Its Derivatives , 1969 .

[2]  Ulrike von Luxburg,et al.  Risk-Based Generalizations of f-divergences , 2011, ICML.

[3]  Aapo Hyvärinen,et al.  Clustering via Mode Seeking by Direct Estimation of the Gradient of a Log-Density , 2014, ECML/PKDD.

[4]  R. Singh Applications of Estimators of a Density and its Derivatives to Certain Statistical Problems , 1977 .

[5]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Clayton D. Scott,et al.  L₂ Kernel Classification , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Larry A. Wasserman,et al.  Non‐parametric inference for density modes , 2013, ArXiv.

[8]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[9]  V. Alekseev Estimation of a probability density function and its derivatives , 1972 .

[10]  M. Wand,et al.  Bandwidth choice for density derivatives , 1990 .

[11]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[12]  Sanjeev R. Kulkarni,et al.  A Nearest-Neighbor Approach to Estimating Divergence between Continuous Random Vectors , 2006, 2006 IEEE International Symposium on Information Theory.

[13]  Takafumi Kanamori,et al.  Density-Difference Estimation , 2012, Neural Computation.

[14]  Gavin Brown,et al.  A New Perspective for Information Theoretic Feature Selection , 2009, AISTATS.

[15]  Takafumi Kanamori,et al.  $f$ -Divergence Estimation and Two-Sample Homogeneity Test Under Semiparametric Density-Ratio Models , 2010, IEEE Transactions on Information Theory.

[16]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  P. Sebastiani,et al.  Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer , 2007, Nature Medicine.

[18]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[19]  R. Singh Improvement on Some Known Nonparametric Uniformly Consistent Estimators of Derivatives of a Density , 1977 .

[20]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[21]  Nigel Collier,et al.  Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation , 2012, Neural Networks.

[22]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[23]  R. Singh On the Exact Asymptotic Behavior of Estimators of a Density and its Derivatives , 1981 .

[24]  D. Cox A penalty method for nonparametric estimation of the logarithmic derivative of a density function , 1985 .

[25]  S. Horvath,et al.  Gene Expression Profiling of Gliomas Strongly Predicts Survival , 2004, Cancer Research.

[26]  Takafumi Kanamori,et al.  A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..

[27]  Gianluca Bontempi,et al.  Causal filter selection in microarray data , 2010, ICML.