Theoretical Analysis of Density Ratio Estimation

Density ratio estimation has gathered a great deal of attention recently since it can be used for various data processing tasks. In this paper, we consider three methods of density ratio estimation: (A) the numerator and denominator densities are separately estimated and then the ratio of the estimated densities is computed, (B) a logistic regression classifier discriminating denominator samples from numerator samples is learned and then the ratio of the posterior probabilities is computed, and (C) the density ratio function is directly modeled and learned by minimizing the empirical Kullback-Leibler divergence. We first prove that when the numerator and denominator densities are known to be members of the exponential family, (A) is better than (B) and (B) is better than (C). Then we show that once the model assumption is violated, (C) is better than (A) and (B). Thus in practical situations where no exact model is available, (C) would be the most promising approach to density ratio estimation.

[1]  Le Song,et al.  Relative Novelty Detection , 2009, AISTATS.

[2]  Michael I. Jordan,et al.  An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators , 2008, ICML '08.

[3]  Shinichi Nakajima,et al.  Pool-based active learning in approximate linear regression , 2009, Machine Learning.

[4]  D. Wiens Robust weights and designs for biased regression models: Least squares and generalized M-estimation , 2000 .

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Takafumi Kanamori,et al.  A Density-ratio Framework for Statistical Data Processing , 2009, IPSJ Trans. Comput. Vis. Appl..

[7]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[8]  C. Chu,et al.  Semiparametric density estimation under a two-sample density ratio model , 2004 .

[9]  Takafumi Kanamori,et al.  Least-Squares Conditional Density Estimation , 2010, IEICE Trans. Inf. Syst..

[10]  B. Efron The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis , 1975 .

[11]  Takafumi Kanamori,et al.  Statistical outlier detection using direct density ratio estimation , 2011, Knowledge and Information Systems.

[12]  A. V. D. Vaart,et al.  Asymptotic Statistics: U -Statistics , 1998 .

[13]  Hidetoshi Shimodaira,et al.  Active learning algorithm using the maximum weighted log-likelihood estimator , 2003 .

[14]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[15]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[16]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[17]  Masashi Sugiyama,et al.  Estimating Squared-Loss Mutual Information for Independent Component Analysis , 2009, ICA.

[18]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[19]  Takafumi Kanamori,et al.  A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[22]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[23]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[24]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[25]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[26]  Paul W. Mielke,et al.  Guest Editorial: Statistical Mining and Data Visualization in Atmospheric Sciences , 2000, Data mining and knowledge discovery.

[27]  Takafumi Kanamori,et al.  Mutual information estimation reveals global associations between stimuli and biological processes , 2009, BMC Bioinformatics.

[28]  Takafumi Kanamori,et al.  Pool-based active learning with optimal sampling distribution and its information geometrical interpretation , 2007, Neurocomputing.

[29]  Martin J. Wainwright,et al.  Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization , 2007, NIPS.

[30]  Masashi Sugiyama,et al.  Input-dependent estimation of generalization error under covariate shift , 2005 .

[31]  Takafumi Kanamori,et al.  Inlier-Based Outlier Detection via Direct Density Ratio Estimation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[32]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[33]  Takafumi Kanamori,et al.  Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation , 2008, FSDM.

[34]  Masashi Sugiyama,et al.  Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error , 2006, J. Mach. Learn. Res..

[35]  Y. Qin Inferences for case-control and semiparametric two-sample density ratio models , 1998 .