Domain Adaptation under Target and Conditional Shift

Let X denote the feature and Y the target. We consider domain adaptation under three possible scenarios: (1) the marginal PY changes, while the conditional PX/Y stays the same (target shift), (2) the marginal PY is fixed, while the conditional PX/Y changes with certain constraints (conditional shift), and (3) the marginal PY changes, and the conditional PX/Y changes with constraints (generalized target shift). Using background knowledge, causal interpretations allow us to determine the correct situation for a problem at hand. We exploit importance reweighting or sample transformation to find the learning machine that works well on test data, and propose to estimate the weights or transformations by reweighting or transforming training data to reproduce the covariate distribution on the test domain. Thanks to kernel embedding of conditional as well as marginal distributions, the proposed approaches avoid distribution estimation, and are applicable for high-dimensional problems. Numerical evaluations on synthetic and real-world data sets demonstrate the effectiveness of the proposed framework.

[1]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[2]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[3]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[4]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[5]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[6]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[7]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[8]  Joydeep Ghosh,et al.  Investigation of the random forest framework for classification of hyperspectral data , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Jin Tian,et al.  Causal Discovery from Changes: a Bayesian Approach , 2001, UAI 2001.

[10]  Ivor W. Tsang,et al.  Domain adaptation from multiple sources via auxiliary classifiers , 2009, ICML '09.

[11]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[12]  C. Glymour,et al.  Making Things Happen: A Theory of Causal Explanation , 2004 .

[13]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[14]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[15]  James J. Jiang A Literature Survey on Domain Adaptation of Statistical Classifiers , 2007 .

[16]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[17]  Neil D. Lawrence,et al.  When Training and Test Sets Are Different: Characterizing Learning Transfer , 2009 .

[18]  Yang Yu,et al.  A framework for modeling positive class expansion with single snapshot , 2010, Knowledge and Information Systems.

[19]  Yi Lin,et al.  Support Vector Machines for Classification in Nonstandard Situations , 2002, Machine Learning.

[20]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[21]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Neil D. Lawrence,et al.  When Training and Test Sets Are Different , 2008 .

[23]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[24]  Hwee Tou Ng,et al.  Word Sense Disambiguation with Distribution Estimation , 2005, IJCAI.

[25]  Jin Tian,et al.  Causal Discovery from Changes , 2001, UAI.

[26]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[27]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[28]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[29]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[30]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[31]  G. W. Snedecor Statistical Methods , 1964 .

[32]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[33]  Amos Storkey,et al.  When Training and Test Sets are Different: Characterising Learning Transfer , 2013 .

[34]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[35]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[36]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[37]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[38]  Steven R. Lerman,et al.  The Estimation of Choice Probabilities from Choice Based Samples , 1977 .

[39]  Le Song,et al.  Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.