Mixture Regression for Covariate Shift

In supervised learning there is a typical presumption that the training and test points are taken from the same distribution. In practice this assumption is commonly violated. The situations where the training and test data are from different distributions is called covariate shift. Recent work has examined techniques for dealing with covariate shift in terms of minimisation of generalisation error. As yet the literature lacks a Bayesian generative perspective on this problem. This paper tackles this issue for regression models. Recent work on covariate shift can be understood in terms of mixture regression. Using this view, we obtain a general approach to regression under covariate shift, which reproduces previous work as a special case. The main advantages of this new formulation over previous models for covariate shift are that we no longer need to presume the test and training densities are known, the regression and density estimation are combined into a single procedure, and previous methods are reproduced as special cases of this procedure, shedding light on the implicit assumptions the methods are making.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  J. Heckman Sample selection bias as a specification error , 1979 .

[4]  W. DeSarbo,et al.  A maximum likelihood methodology for clusterwise linear regression , 1988 .

[5]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[6]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[7]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[8]  W. DeSarbo,et al.  A mixture likelihood approach for generalized linear models , 1995 .

[9]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[10]  Christian Hennig,et al.  Identifiablity of Models for Clusterwise Linear Regression , 2000, J. Classif..

[11]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[12]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[13]  Ralf Herbrich,et al.  Learning Kernel Classifiers , 2001 .

[14]  Christian R. Shelton,et al.  Importance sampling for reinforcement learning with multiple objectives , 2001 .

[15]  J. Vermunt A general latent class approach to unobserved heterogeneity in the analysis of event history data , 2002 .

[16]  Gustavo A. Stolovitzky,et al.  Bioinformatics: The Machine Learning Approach , 2002 .

[17]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[18]  H. Sung Gaussian Mixture Regression and Classification , 2004 .

[19]  K. Müller,et al.  Generalization Error Estimation under Covariate Shift , 2005 .