论文信息 - Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation

Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation

Estimation of the ratio of probability densities has attracted a great deal of attention since it can be used for addressing various statistical paradigms. A naive approach to density-ratio approximation is to first estimate numerator and denominator densities separately and then take their ratio. However, this two-step approach does not perform well in practice, and methods for directly estimating density ratios without density estimation have been explored. In this paper, we first give a comprehensive review of existing density-ratio estimation methods and discuss their pros and cons. Then we propose a new framework of density-ratio estimation in which a density-ratio model is fitted to the true density-ratio under the Bregman divergence. Our new framework includes existing approaches as special cases, and is substantially more general. Finally, we develop a robust density-ratio estimation method under the power divergence, which is a novel instance in our framework.

Masashi Sugiyama | T. Kanamori | Teruyuki Suzuki

[1] K. Pearson. On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[2] R. A. Leibler,et al. On Information and Sufficiency , 1951 .

[3] S. M. Ali,et al. A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[4] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[5] R. Tyrrell Rockafellar,et al. Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[6] B. Silverman,et al. Density Ratios, Empirical Likelihood and Cot Death , 1978 .

[7] Peter J. Huber,et al. Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[8] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[9] Peter M. Williams,et al. Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[10] M. Best. An Algorithm for the Solution of the Parametric Quadratic Programming Problem , 1996 .

[11] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[12] M. C. Jones,et al. Robust and efficient estimation by minimising a density power divergence , 1998 .

[13] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[14] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[15] Y. Qin. Inferences for case-control and semiparametric two-sample density ratio models , 1998 .

[16] Alexander J. Smola,et al. Learning with kernels , 1998 .

[17] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[18] Christopher M. Bishop,et al. Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[19] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[20] Ingo Steinwart,et al. On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[21] M. C. Jones,et al. A Comparison of related density-based minimum divergence estimators , 2001 .

[22] A. Keziou. Dual representation of Φ-divergences and applications , 2003 .

[23] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.

[24] C. Chu,et al. Semiparametric density estimation under a two-sample density ratio model , 2004 .

[25] Takafumi Kanamori,et al. Information Geometry of U-Boost and Bregman Divergence , 2004, Neural Computation.

[26] B. Ripley,et al. Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[27] Yoram Singer,et al. Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[28] Robert Tibshirani,et al. The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[29] T. Minka. A comparison of numerical optimizers for logistic regression , 2004 .

[30] Masashi Sugiyama,et al. Input-dependent estimation of generalization error under covariate shift , 2005 .

[31] Inderjit S. Dhillon,et al. Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[32] A. Keziou,et al. Test of homogeneity in semiparametric two-sample density ratio models , 2005 .

[33] Inderjit S. Dhillon,et al. Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[34] Bernhard Schölkopf,et al. Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[35] Wolfgang Stummer,et al. Some Bregman distances between financial diffusion processes , 2007 .

[36] Klaus-Robert Müller,et al. Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[37] Lawrence Cayton,et al. Fast nearest neighbor retrieval for bregman divergences , 2008, ICML '08.

[38] Takafumi Kanamori,et al. Inlier-Based Outlier Detection via Direct Density Ratio Estimation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[39] Takafumi Kanamori,et al. Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation , 2008, FSDM.

[40] Thomas Lengauer,et al. Multi-task learning for HIV therapy screening , 2008, ICML '08.

[41] Masashi Sugiyama,et al. Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation , 2008, SDM.

[42] Yuhong Yang. Elements of Information Theory (2nd ed.). Thomas M. Cover and Joy A. Thomas , 2008 .

[43] M. Kawanabe,et al. Direct importance estimation for covariate shift adaptation , 2008 .

[44] S. Eguchi,et al. Robust parameter estimation with a small bias against heavy contamination , 2008 .

[45] Masashi Sugiyama,et al. Change-Point Detection in Time-Series Data by Direct Density-Ratio Estimation , 2009, SDM.

[46] Nenghai Yu,et al. Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering , 2009, NIPS.

[47] Karl Pearson F.R.S.. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[48] Le Song,et al. Relative Novelty Detection , 2009, AISTATS.

[49] Masashi Sugiyama,et al. Estimating Squared-Loss Mutual Information for Independent Component Analysis , 2009, ICA.

[50] Karsten M. Borgwardt,et al. Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[51] Masashi Sugiyama,et al. Mutual information approximation via maximum likelihood estimation of density ratio , 2009, 2009 IEEE International Symposium on Information Theory.

[52] Takafumi Kanamori,et al. Mutual information estimation reveals global associations between stimuli and biological processes , 2009, BMC Bioinformatics.

[53] Robert Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[54] Takafumi Kanamori,et al. A Density-ratio Framework for Statistical Data Processing , 2009, IPSJ Trans. Comput. Vis. Appl..

[55] Neil D. Lawrence,et al. Dataset Shift in Machine Learning , 2009 .

[56] Takafumi Kanamori,et al. A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..

[57] Masashi Sugiyama,et al. Direct Importance Estimation with Gaussian Mixture Models , 2009, IEICE Trans. Inf. Syst..

[58] Motoaki Kawanabe,et al. Direct Density Ratio Estimation with Dimensionality Reduction , 2010, SDM.

[59] Masashi Sugiyama,et al. Direct Importance Estimation with a Mixture of Probabilistic Principal Component Analyzers , 2010, IEICE Trans. Inf. Syst..

[60] Takafumi Kanamori,et al. Statistical outlier detection using direct density ratio estimation , 2011, Knowledge and Information Systems.

[61] Masashi Sugiyama,et al. Superfast-Trainable Multi-Class Probabilistic Classifier by Least-Squares Posterior Fitting , 2010, IEICE Trans. Inf. Syst..

[62] Takafumi Kanamori,et al. Theoretical Analysis of Density Ratio Estimation , 2010, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[63] Takafumi Kanamori,et al. Least-Squares Conditional Density Estimation , 2010, IEICE Trans. Inf. Syst..

[64] Masashi Sugiyama,et al. Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise , 2010 .

[65] Martin J. Wainwright,et al. Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[66] Takafumi Kanamori,et al. Least-squares two-sample test , 2011, Neural Networks.

[67] Masashi Sugiyama,et al. Dependence-Maximization Clustering with Least-Squares Mutual Information , 2011, J. Adv. Comput. Intell. Intell. Informatics.

[68] Masashi Sugiyama,et al. Least-Squares Independent Component Analysis , 2011, Neural Computation.

[69] Takafumi Kanamori,et al. Statistical analysis of kernel-based least-squares density-ratio estimation , 2012, Machine Learning.

[70] R. Tibshirani,et al. Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[71] Motoaki Kawanabe,et al. Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search , 2011, Neural Networks.

[72] Masashi Sugiyama,et al. Improving the Accuracy of Least-Squares Probabilistic Classifiers , 2011, IEICE Trans. Inf. Syst..

[73] Masashi Sugiyama,et al. Cross-Domain Object Matching with Model Selection , 2011, AISTATS.

[74] Takafumi Kanamori,et al. Density Ratio Estimation in Machine Learning , 2012 .

[75] Motoaki Kawanabe,et al. Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation , 2012, Adaptive computation and machine learning.

[76] Masashi Sugiyama,et al. Sufficient Dimension Reduction via Squared-Loss Mutual Information Estimation , 2010, Neural Computation.