Continuous Target Shift Adaptation in Supervised Learning

Supervised learning in machine learning concerns inferring an underlying relation between covariate x and target y based on training covariate-target data. It is traditionally assumed that training data and test data, on which the generalization performance of a learning algorithm is measured, follow the same probability distribution. However, this standard assumption is often violated in many real-world applications such as computer vision, natural language processing, robot control, or survey design, due to intrinsic non-stationarity of the environment or inevitable sample selection bias. This situation is called dataset shift and has attracted a great deal of attention recently. In the paper, we consider supervised learning problems under the target shift scenario, where the target marginal distribution p(y) changes between the training and testing phases, while the target-conditioned covariate distribution p(x|y) remains unchanged. Although various methods for mitigating target shift in classification (a.k.a. class prior change) have been developed so far, few methods can be applied to continuous targets. In this paper, we propose methods for continuous target shift adaptation in regression and conditional density estimation. More specifically, our contribution is a novel importance weight estimator for continuous targets. Through experiments, the usefulness of the proposed method is demonstrated.

[1]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[2]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[3]  Masashi Sugiyama,et al.  Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation , 2008, SDM.

[4]  Sunita Sarawagi,et al.  Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection , 2014, ICML.

[5]  Takafumi Kanamori,et al.  Direct Divergence Approximation between Probability Distributions and Its Applications in Machine Learning , 2013, J. Comput. Sci. Eng..

[6]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[7]  Masashi Sugiyama,et al.  Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching , 2012, ICML.

[8]  Takafumi Kanamori,et al.  Computational complexity of kernel-based density-ratio estimation: a condition number analysis , 2012, Machine Learning.

[9]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[10]  Takafumi Kanamori,et al.  Least-Squares Conditional Density Estimation , 2010, IEICE Trans. Inf. Syst..

[11]  Takafumi Kanamori,et al.  Density-Difference Estimation , 2012, Neural Computation.

[12]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[13]  Amos Storkey,et al.  When Training and Test Sets are Different: Characterising Learning Transfer , 2013 .

[14]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[15]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[16]  Motoaki Kawanabe,et al.  Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation , 2012, Adaptive computation and machine learning.

[17]  Neil D. Lawrence,et al.  When Training and Test Sets Are Different: Characterizing Learning Transfer , 2009 .

[18]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[19]  Steven R. Lerman,et al.  The Estimation of Choice Probabilities from Choice Based Samples , 1977 .

[20]  J. Heckman Sample selection bias as a specification error , 1979 .

[21]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.