Bias Reduction and Metric Learning for Nearest-Neighbor Estimation of Kullback-Leibler Divergence

Nearest-neighbor estimators for the Kullback-Leiber (KL) divergence that are asymptotically unbiased have recently been proposed and demonstrated in a number of applications. However, with a small number of samples, nonparametric methods typically suffer from large estimation bias due to the nonlocality of information derived from nearest-neighbor statistics. In this letter, we show that this estimation bias can be mitigated by modifying the metric function, and we propose a novel method for learning a locally optimal Mahalanobis distance function from parametric generative models of the underlying density distributions. Using both simulations and experiments on a variety of data sets, we demonstrate that this interplay between approximate generative models and nonparametric techniques can significantly improve the accuracy of nearest-neighbor-based estimation of the KL divergence.

[1]  Takafumi Kanamori,et al.  $f$ -Divergence Estimation and Two-Sample Homogeneity Test Under Semiparametric Density-Ratio Models , 2010, IEEE Transactions on Information Theory.

[2]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[3]  Michael I. Jordan,et al.  DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification , 2008, NIPS.

[4]  Leonid Hrebien,et al.  New Criteria for Selecting Differentially Expressed Genes Filter-Based Feature Selection Techniques for Better Detection of Changes in the Distributions of Expression Levels , 2007 .

[5]  Bo Ranneby,et al.  The Maximum Spacing Method. An Estimation Method Related to the Maximum Likelihood Method , 2016 .

[6]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[7]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[8]  KawaharaYoshinobu,et al.  Sequential change-point detection based on direct density-ratio estimation , 2012 .

[9]  Aram Galstyan,et al.  Efficient Estimation of Mutual Information for Strongly Dependent Variables , 2014, AISTATS.

[10]  K KateCraig,et al.  A new class , 2004 .

[11]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[12]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[13]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  J. Bergh,et al.  Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. , 2007, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[15]  L. Pronzato,et al.  A class of Rényi information estimators for multidimensional densities , 2008, 0810.5302.

[16]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[17]  Ulrike von Luxburg,et al.  Risk-Based Generalizations of f-divergences , 2011, ICML.

[18]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[19]  Fernando Pérez-Cruz,et al.  Kullback-Leibler divergence estimation of continuous distributions , 2008, 2008 IEEE International Symposium on Information Theory.

[20]  Barnabás Póczos,et al.  Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators , 2016, NIPS.

[21]  B. Hamann,et al.  Change Point Detection , 2018 .

[22]  M. N. Goria,et al.  A new class of random vector entropy estimators and its applications in testing statistical hypotheses , 2005 .

[23]  Zoran Nenadic,et al.  Approximate information discriminant analysis: A computationally simple heteroscedastic feature extraction technique , 2008, Pattern Recognit..

[24]  Byoung-Tak Zhang,et al.  Generative Local Metric Learning for Nearest Neighbor Classification , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Sanjeev R. Kulkarni,et al.  A Nearest-Neighbor Approach to Estimating Divergence between Continuous Random Vectors , 2006, 2006 IEEE International Symposium on Information Theory.

[26]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[27]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Gianluca Bontempi,et al.  Causal filter selection in microarray data , 2010, ICML.

[29]  Rewiser Rewiser On Discriminative vs . Generative Classifiers : A Compari Son of Logistic Regression and Naive Bayes , 2017 .

[30]  Gavin Brown,et al.  A New Perspective for Information Theoretic Feature Selection , 2009, AISTATS.

[31]  Barnabás Póczos,et al.  Nonparametric Estimation of Conditional Information and Divergences , 2012, AISTATS.

[32]  Rajat Raina,et al.  Classification with Hybrid Generative/Discriminative Models , 2003, NIPS.

[33]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[34]  S. Horvath,et al.  Gene Expression Profiling of Gliomas Strongly Predicts Survival , 2004, Cancer Research.

[35]  Barnabás Póczos,et al.  On the Estimation of alpha-Divergences , 2011, AISTATS.

[36]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[37]  Sanjeev R. Kulkarni,et al.  Universal Estimation of Information Measures for Analog Sources , 2009, Found. Trends Commun. Inf. Theory.

[38]  Alfred O. Hero,et al.  Multivariate f-divergence Estimation With Confidence , 2014, NIPS.

[39]  Masashi Sugiyama,et al.  Change-point detection in time-series data by relative density-ratio estimation , 2012 .

[40]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[41]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  L. Tanoue Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer , 2009 .

[43]  Martin J. Wainwright,et al.  Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization , 2007, NIPS.

[44]  Leonid Hrebien,et al.  New criteria for selecting differentially expressed genes. , 2007, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[45]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[46]  Masashi Sugiyama,et al.  Sequential change‐point detection based on direct density‐ratio estimation , 2012, Stat. Anal. Data Min..

[47]  Qing Wang,et al.  Divergence Estimation for Multidimensional Densities Via $k$-Nearest-Neighbor Distances , 2009, IEEE Transactions on Information Theory.