Meta-Learning for Relative Density-Ratio Estimation

The ratio of two probability densities, called a density-ratio, is a vital quantity in machine learning. In particular, a relative density-ratio, which is a bounded extension of the density-ratio, has received much attention due to its stability and has been used in various applications such as outlier detection and dataset comparison. Existing methods for (relative) density-ratio estimation (DRE) require many instances from both densities. However, sufficient instances are often unavailable in practice. In this paper, we propose a meta-learning method for relative DRE, which estimates the relative density-ratio from a few instances by using knowledge in related datasets. Specifically, given two datasets that consist of a few instances, our model extracts the datasets’ information by using neural networks and uses it to obtain instance embeddings appropriate for the relative DRE. We model the relative density-ratio by a linear model on the embedded space, whose global optimum solution can be obtained as a closed-form solution. The closed-form solution enables fast and effective adaptation to a few instances, and its differentiability enables us to train our model such that the expected test error for relative DRE can be explicitly minimized after adapting to a few instances. We empirically demonstrate the effectiveness of the proposed method by using three problems: relative DRE, dataset comparison, and outlier detection.

[1]  Takafumi Kanamori,et al.  Relative Density-Ratio Estimation for Robust Distribution Comparison , 2011, Neural Computation.

[2]  Mohamed Bekkar,et al.  Evaluation Measures for Models Assessment over Imbalanced Data Sets , 2013 .

[3]  Jayant Kalagnanam,et al.  Multi-task Multi-modal Models for Collective Anomaly Detection , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[4]  Masahiro Kato,et al.  Learning from Positive and Unlabeled Data with a Selection Bias , 2018, ICLR.

[5]  Michael U. Gutmann,et al.  Telescoping Density-Ratio Estimation , 2020, NeurIPS.

[6]  Masahiro Kato,et al.  Non-Negative Bregman Divergence Minimization for Deep Direct Density Ratio Estimation , 2020, ICML.

[7]  Yang Gao,et al.  Multistream Classification with Relative Density Ratio Estimation , 2019, AAAI.

[8]  Masashi Sugiyama,et al.  Direct Approximation of Divergences Between Probability Distributions , 2013, Empirical Inference.

[9]  Hiroshi Takahashi,et al.  Variational Autoencoder with Implicit Optimal Priors , 2018, AAAI.

[10]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[13]  Atsutoshi Kumagai,et al.  Meta-learning from Tasks with Heterogeneous Attribute Spaces , 2020, NeurIPS.

[14]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[15]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[16]  Akiko Takeda,et al.  Trimmed Density Ratio Estimation , 2017, NIPS.

[17]  Mengjie Zhang,et al.  Domain Generalization for Object Recognition with Multi-task Autoencoders , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Masatoshi Uehara,et al.  Generative Adversarial Nets from a Density Ratio Estimation Perspective , 2016, 1610.02920.

[19]  Gang Niu,et al.  Positive-Unlabeled Learning with Non-Negative Risk Estimator , 2017, NIPS.

[20]  Sergey Levine,et al.  Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[21]  Nigel Collier,et al.  Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation , 2012, Neural Networks.

[22]  Subhransu Maji,et al.  Meta-Learning With Differentiable Convex Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Takafumi Kanamori,et al.  A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..

[24]  Takafumi Kanamori,et al.  Statistical outlier detection using direct density ratio estimation , 2011, Knowledge and Information Systems.

[25]  Masashi Sugiyama,et al.  Anomaly Detection by Deep Direct Density Ratio Estimation , 2019 .

[26]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[27]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[28]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[29]  Tomoya Sakai,et al.  Covariate Shift Adaptation on Learning from Positive and Unlabeled Data , 2019, AAAI.

[30]  Yasuhiro Fujiwara,et al.  Transfer Anomaly Detection by Inferring Latent Domain Representations , 2019, NeurIPS.

[31]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[32]  Takehisa Yairi,et al.  Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction , 2014, MLSDA'14.

[33]  Tomoharu Iwata,et al.  Learning Latest Classifiers without Additional Labeled Data , 2017, IJCAI.

[34]  Carla E. Brodley,et al.  Machine learning techniques for the computer security domain of anomaly detection , 2000 .

[35]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[36]  Luca Bertinetto,et al.  Meta-learning with differentiable closed-form solvers , 2018, ICLR.

[37]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[38]  Masashi Sugiyama,et al.  Rethinking Importance Weighting for Deep Learning under Distribution Shift , 2020, NeurIPS.

[39]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[40]  Changsheng Li,et al.  Self-Paced Multi-Task Learning , 2016, AAAI.

[41]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[42]  Alexander Binder,et al.  Deep One-Class Classification , 2018, ICML.

[43]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[44]  Y. Qin Inferences for case-control and semiparametric two-sample density ratio models , 1998 .

[45]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[46]  Yee Whye Teh,et al.  Neural Processes , 2018, ArXiv.

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  Gorjan Alagic,et al.  #p , 2019, Quantum information & computation.

[49]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[50]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[51]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[52]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[53]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[54]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.