Hypothesis Transfer Learning via Transformation Functions

We consider the Hypothesis Transfer Learning (HTL) problem where one incorporates a hypothesis trained on the source domain into the learning procedure of the target domain. Existing theoretical analysis either only studies specific algorithms or only presents upper bounds on the generalization error but not on the excess risk. In this paper, we propose a unified algorithm-dependent framework for HTL through a novel notion of transformation function, which characterizes the relation between the source and the target domains. We conduct a general risk analysis of this framework and in particular, we show for the first time, if two domains are related, HTL enjoys faster convergence rates of excess risks for Kernel Smoothing and Kernel Ridge Regression than those of the classical non-transfer learning settings. Experiments on real world data demonstrate the effectiveness of our framework.

[1]  C. Craig On the Tchebychef Inequality of Bernstein , 1933 .

[2]  M. Botvinick,et al.  Conflict monitoring and cognitive control. , 2001, Psychological review.

[3]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[4]  Lawrence Carin,et al.  Logistic regression with an auxiliary data source , 2005, ICML.

[5]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[6]  Robert D. Nowak,et al.  Signal Reconstruction From Noisy Random Projections , 2006, IEEE Transactions on Information Theory.

[7]  Raymond J. Carroll,et al.  Measurement error in nonlinear models: a modern perspective , 2006 .

[8]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[10]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[11]  Stergios B. Fotopoulos,et al.  All of Nonparametric Statistics , 2007, Technometrics.

[12]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[13]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[14]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[15]  Ding-Xuan Zhou Derivative reproducing properties for kernel methods in learning theory , 2008 .

[16]  Don R. Hush,et al.  Optimal Rates for Regularized Least Squares Regression , 2009, COLT.

[17]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[18]  Giulio Sandini,et al.  Model adaptation with least-squares SVM for adaptive hand prosthetics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[19]  B. Caputo,et al.  Safety in numbers: Learning categories from few examples with multi model knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Tyler Lu,et al.  Impossibility Theorems for Domain Adaptation , 2010, AISTATS.

[21]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[22]  Mehryar Mohri,et al.  Domain Adaptation in Regression , 2011, ALT.

[23]  Mehryar Mohri,et al.  New Analysis and Algorithm for Learning with Drifting Distributions , 2012, ALT.

[24]  Larry A. Wasserman,et al.  Density-Sensitive Semisupervised Inference , 2012, ArXiv.

[25]  Yaoliang Yu,et al.  Analysis of Kernel Mean Matching under Covariate Shift , 2012, ICML.

[26]  Shai Ben-David,et al.  On the Hardness of Domain Adaptation and the Utility of Unlabeled Target Samples , 2012, ALT.

[27]  Rich Caruana,et al.  Inductive Transfer for Bayesian Network Structure Learning , 2007, ICML Unsupervised and Transfer Learning.

[28]  Sanjiv Singh,et al.  Modeling and Calibrating Visual Yield Estimates in Vineyards , 2012, FSR.

[29]  Ilja Kuzborskij,et al.  Stability and Hypothesis Transfer Learning , 2013, ICML.

[30]  Vladimir Vovk,et al.  Kernel Ridge Regression , 2013, Empirical Inference.

[31]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[32]  Ilja Kuzborskij,et al.  From N to N+1: Multiclass Transfer Incremental Learning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Vikas K. Garg,et al.  Adaptivity to Local Smoothness and Dimension in Kernel Regression , 2013, NIPS.

[34]  Shai Ben-David Domain Adaptation as Learning with Auxiliary Information , 2013 .

[35]  Mehryar Mohri,et al.  Domain adaptation and sample bias correction theory and algorithm for regression , 2014, Theor. Comput. Sci..

[36]  Jeff G. Schneider,et al.  Flexible Transfer Learning under Support and Model Shift , 2014, NIPS.

[37]  Ilja Kuzborskij,et al.  Learning by Transferring from Auxiliary Hypotheses , 2014, ArXiv.

[38]  T. Verstynen The organization and dynamics of corticostriatal pathways link the medial orbitofrontal cortex to future behavioral responses. , 2014, Journal of neurophysiology.

[39]  Yu Zhang,et al.  Multi-Task Learning and Algorithmic Stability , 2015, AAAI.

[40]  Jeff G. Schneider,et al.  Generalization Bounds for Transfer Learning under Model Shift , 2015, UAI.

[41]  Mehryar Mohri,et al.  Adaptation Algorithm and Theory Based on Generalized Discrepancy , 2014, KDD.

[42]  Barnabás Póczos,et al.  Nonparametric Risk and Stability Analysis for Multi-Task Learning Problems , 2016, IJCAI.

[43]  Ilja Kuzborskij,et al.  Fast rates by transferring from auxiliary hypotheses , 2014, Machine Learning.

[44]  Barnabás Póczos,et al.  Efficient Nonparametric Smoothness Estimation , 2016, NIPS.

[45]  Dacheng Tao,et al.  Algorithm-Dependent Generalization Bounds for Multi-Task Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Ilja Kuzborskij,et al.  Scalable greedy algorithms for transfer learning , 2014, Comput. Vis. Image Underst..