Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation

Estimation of the ratio of probability densities has attracted a great deal of attention since it can be used for addressing various statistical paradigms. A naive approach to density-ratio approximation is to first estimate numerator and denominator densities separately and then take their ratio. However, this two-step approach does not perform well in practice, and methods for directly estimating density ratios without density estimation have been explored. In this paper, we first give a comprehensive review of existing density-ratio estimation methods and discuss their pros and cons. Then we propose a new framework of density-ratio estimation in which a density-ratio model is fitted to the true density-ratio under the Bregman divergence. Our new framework includes existing approaches as special cases, and is substantially more general. Finally, we develop a robust density-ratio estimation method under the power divergence, which is a novel instance in our framework.

[1]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[2]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[3]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[4]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[5]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[6]  B. Silverman,et al.  Density Ratios, Empirical Likelihood and Cot Death , 1978 .

[7]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[10]  M. Best An Algorithm for the Solution of the Parametric Quadratic Programming Problem , 1996 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  M. C. Jones,et al.  Robust and efficient estimation by minimising a density power divergence , 1998 .

[13]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[14]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[15]  Y. Qin Inferences for case-control and semiparametric two-sample density ratio models , 1998 .

[16]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[17]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[18]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[19]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[20]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[21]  M. C. Jones,et al.  A Comparison of related density-based minimum divergence estimators , 2001 .

[22]  A. Keziou Dual representation of Φ-divergences and applications , 2003 .

[23]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[24]  C. Chu,et al.  Semiparametric density estimation under a two-sample density ratio model , 2004 .

[25]  Takafumi Kanamori,et al.  Information Geometry of U-Boost and Bregman Divergence , 2004, Neural Computation.

[26]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[27]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[28]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[29]  T. Minka A comparison of numerical optimizers for logistic regression , 2004 .

[30]  Masashi Sugiyama,et al.  Input-dependent estimation of generalization error under covariate shift , 2005 .

[31]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[32]  A. Keziou,et al.  Test of homogeneity in semiparametric two-sample density ratio models , 2005 .

[33]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[34]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[35]  Wolfgang Stummer,et al.  Some Bregman distances between financial diffusion processes , 2007 .

[36]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[37]  Lawrence Cayton,et al.  Fast nearest neighbor retrieval for bregman divergences , 2008, ICML '08.

[38]  Takafumi Kanamori,et al.  Inlier-Based Outlier Detection via Direct Density Ratio Estimation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[39]  Takafumi Kanamori,et al.  Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation , 2008, FSDM.

[40]  Thomas Lengauer,et al.  Multi-task learning for HIV therapy screening , 2008, ICML '08.

[41]  Masashi Sugiyama,et al.  Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation , 2008, SDM.

[42]  Yuhong Yang Elements of Information Theory (2nd ed.). Thomas M. Cover and Joy A. Thomas , 2008 .

[43]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[44]  S. Eguchi,et al.  Robust parameter estimation with a small bias against heavy contamination , 2008 .

[45]  Masashi Sugiyama,et al.  Change-Point Detection in Time-Series Data by Direct Density-Ratio Estimation , 2009, SDM.

[46]  Nenghai Yu,et al.  Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering , 2009, NIPS.

[47]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[48]  Le Song,et al.  Relative Novelty Detection , 2009, AISTATS.

[49]  Masashi Sugiyama,et al.  Estimating Squared-Loss Mutual Information for Independent Component Analysis , 2009, ICA.

[50]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[51]  Masashi Sugiyama,et al.  Mutual information approximation via maximum likelihood estimation of density ratio , 2009, 2009 IEEE International Symposium on Information Theory.

[52]  Takafumi Kanamori,et al.  Mutual information estimation reveals global associations between stimuli and biological processes , 2009, BMC Bioinformatics.

[53]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[54]  Takafumi Kanamori,et al.  A Density-ratio Framework for Statistical Data Processing , 2009, IPSJ Trans. Comput. Vis. Appl..

[55]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[56]  Takafumi Kanamori,et al.  A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..

[57]  Masashi Sugiyama,et al.  Direct Importance Estimation with Gaussian Mixture Models , 2009, IEICE Trans. Inf. Syst..

[58]  Motoaki Kawanabe,et al.  Direct Density Ratio Estimation with Dimensionality Reduction , 2010, SDM.

[59]  Masashi Sugiyama,et al.  Direct Importance Estimation with a Mixture of Probabilistic Principal Component Analyzers , 2010, IEICE Trans. Inf. Syst..

[60]  Takafumi Kanamori,et al.  Statistical outlier detection using direct density ratio estimation , 2011, Knowledge and Information Systems.

[61]  Masashi Sugiyama,et al.  Superfast-Trainable Multi-Class Probabilistic Classifier by Least-Squares Posterior Fitting , 2010, IEICE Trans. Inf. Syst..

[62]  Takafumi Kanamori,et al.  Theoretical Analysis of Density Ratio Estimation , 2010, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[63]  Takafumi Kanamori,et al.  Least-Squares Conditional Density Estimation , 2010, IEICE Trans. Inf. Syst..

[64]  Masashi Sugiyama,et al.  Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise , 2010 .

[65]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[66]  Takafumi Kanamori,et al.  Least-squares two-sample test , 2011, Neural Networks.

[67]  Masashi Sugiyama,et al.  Dependence-Maximization Clustering with Least-Squares Mutual Information , 2011, J. Adv. Comput. Intell. Intell. Informatics.

[68]  Masashi Sugiyama,et al.  Least-Squares Independent Component Analysis , 2011, Neural Computation.

[69]  Takafumi Kanamori,et al.  Statistical analysis of kernel-based least-squares density-ratio estimation , 2012, Machine Learning.

[70]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[71]  Motoaki Kawanabe,et al.  Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search , 2011, Neural Networks.

[72]  Masashi Sugiyama,et al.  Improving the Accuracy of Least-Squares Probabilistic Classifiers , 2011, IEICE Trans. Inf. Syst..

[73]  Masashi Sugiyama,et al.  Cross-Domain Object Matching with Model Selection , 2011, AISTATS.

[74]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[75]  Motoaki Kawanabe,et al.  Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation , 2012, Adaptive computation and machine learning.

[76]  Masashi Sugiyama,et al.  Sufficient Dimension Reduction via Squared-Loss Mutual Information Estimation , 2010, Neural Computation.