A Unified Framework for Multi-distribution Density Ratio Estimation

Binary density ratio estimation (DRE), the problem of estimating the ratio p1/p2 given their empirical samples, provides the foundation for many state-of-the-art machine learning algorithms such as contrastive representation learning and covariate shift adaptation. In this work, we consider a generalized setting where given samples from multiple distributions p1, . . . , pk (for k > 2), we aim to efficiently estimate the density ratios between all pairs of distributions. Such a generalization leads to important new applications such as estimating statistical discrepancy among multiple random variables like multi-distribution f -divergence, and bias correction via multiple importance sampling. We then develop a general framework from the perspective of Bregman divergence minimization, where each strictly convex multivariate function induces a proper loss for multi-distribution DRE. Moreover, we rederive the theoretical connection between multi-distribution density ratio estimation and class probability estimation, justifying the use of any strictly proper scoring rule composite with a link function for multi-distribution DRE. We show that our framework leads to methods that strictly generalize their counterparts in binary DRE, as well as new methods that show comparable or superior performance on various downstream tasks.

[1]  David Luengo,et al.  Generalized Multiple Importance Sampling , 2015, Statistical Science.

[2]  Robert P. W. Duin,et al.  FIDOS: A generalized Fisher based feature extraction method for domain shift , 2013, Pattern Recognit..

[3]  Masahiro Kato,et al.  Non-Negative Bregman Divergence Minimization for Deep Direct Density Ratio Estimation , 2020, ICML.

[4]  M. Degroot Uncertainty, Information, and Sequential Experiments , 1962 .

[5]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[6]  Stefano Ermon,et al.  Featurized Density Ratio Estimation , 2021, UAI.

[7]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[8]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[9]  Takafumi Kanamori,et al.  A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..

[10]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[11]  Masashi Sugiyama,et al.  Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation , 2012 .

[12]  Takafumi Kanamori,et al.  Least-squares two-sample test , 2011, Neural Networks.

[13]  A. Owen,et al.  Safe and Effective Importance Sampling , 2000 .

[14]  Martin J. Wainwright,et al.  Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization , 2007, NIPS.

[15]  Cheng Soon Ong,et al.  Linking losses for density ratio and class-probability estimation , 2016, ICML.

[16]  L. Schmetterer Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete. , 1963 .

[17]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[18]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[19]  Akiko Takeda,et al.  Trimmed Density Ratio Estimation , 2017, NIPS.

[20]  Mikhail Belkin,et al.  Inverse Density as an Inverse Problem: the Fredholm Equation Approach , 2013, NIPS.

[21]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[22]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[23]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[24]  Le Song,et al.  Relative Novelty Detection , 2009, AISTATS.

[25]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[26]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[27]  Orhan Firat,et al.  Massively Multilingual Neural Machine Translation , 2019, NAACL.

[28]  Takafumi Kanamori,et al.  Statistical outlier detection using direct density ratio estimation , 2011, Knowledge and Information Systems.

[29]  Takafumi Kanamori,et al.  Statistical analysis of kernel-based least-squares density-ratio estimation , 2012, Machine Learning.

[30]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[31]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[32]  Mónica F. Bugallo,et al.  Efficient Multiple Importance Sampling Estimators , 2015, IEEE Signal Processing Letters.

[33]  Mark D. Reid,et al.  Composite Multiclass Losses , 2011, J. Mach. Learn. Res..

[34]  Richard Nock,et al.  A scaled Bregman theorem with applications , 2016, NIPS.

[35]  Thomas Lengauer,et al.  Multi-task learning for HIV therapy screening , 2008, ICML '08.

[36]  P. Carlo Population Monte Carlo , 2004 .

[37]  R. Sibson Information radius , 1969 .

[38]  Takafumi Kanamori,et al.  Inlier-Based Outlier Detection via Direct Density Ratio Estimation , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[39]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[40]  Samory Kpotufe Lipschitz Density-Ratios, Structured Data, and Data-driven Tuning , 2017, AISTATS.

[41]  S. Eguchi,et al.  Robust parameter estimation with a small bias against heavy contamination , 2008 .

[42]  Masatoshi Uehara,et al.  Generative Adversarial Nets from a Density Ratio Estimation Perspective , 2016, 1610.02920.

[43]  Feng Ruan,et al.  Information Measures, Experiments, Multi-category Hypothesis Tests, and Surrogate Losses , 2016, ArXiv.

[44]  Robert C. Williamson,et al.  25th Annual Conference on Learning Theory Divergences and Risks for Multiclass Experiments , 2022 .

[45]  Michael H. Bowling,et al.  Apprenticeship learning using linear programming , 2008, ICML '08.

[46]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[47]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[48]  Masatoshi Uehara,et al.  Optimal Off-Policy Evaluation from Multiple Logging Policies , 2020, ICML.

[49]  Eric Horvitz,et al.  Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting , 2019, DGS@ICLR.

[50]  Michael U. Gutmann,et al.  Telescoping Density-Ratio Estimation , 2020, NeurIPS.