Transfer Learning via Learning to Transfer

In transfer learning, what and how to transfer are two primary issues to be addressed, as different transfer learning algorithms applied between a source and a target domain result in different knowledge transferred and thereby the performance improvement in the target domain. Determining the optimal one that maximizes the performance improvement requires either exhaustive exploration or considerable expertise. Meanwhile, it is widely accepted in educational psychology that human beings improve transfer learning skills of deciding what to transfer through meta-cognitive reflection on inductive transfer learning practices. Motivated by this, we propose a novel transfer learning framework known as Learning to Transfer (L2T) to automatically determine what and how to transfer are the best by leveraging previous transfer learning experiences. We establish the L2T framework in two stages: 1) we learn a reflection function encrypting transfer learning skills from experiences; and 2) we infer what and how to transfer are the best for a future pair of domains by optimizing the reflection function. We also theoretically analyse the algorithmic stability and generalization bound of L2T, and empirically demonstrate its superiority over several state-ofthe-art transfer learning algorithms.

[1]  S. Bernstein Sur l'extension du théoréme limite du calcul des probabilités aux sommes de quantités dépendantes , 1927 .

[2]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, ICCV.

[3]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[4]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[5]  Brian C. Lovell,et al.  Domain Adaptation on the Statistical Manifold , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  M. Mohri,et al.  Stability Bounds for Stationary φ-mixing and β-mixing Processes , 2010 .

[7]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[8]  Andreas Maurer,et al.  Algorithmic Stability and Meta-Learning , 2005, J. Mach. Learn. Res..

[9]  David Zhang,et al.  LSDT: Latent Sparse Domain Transfer Learning for Visual Adaptation , 2016, IEEE Transactions on Image Processing.

[10]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[11]  Peng Li,et al.  Similarity Metric Learning for Face Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[13]  Christoph H. Lampert,et al.  Lifelong Learning with Non-i.i.d. Tasks , 2015, NIPS.

[14]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[15]  Yuan Shi,et al.  Information-Theoretical Learning of Discriminative Clusters for Unsupervised Domain Adaptation , 2012, ICML.

[16]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[17]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[18]  Yu Zhang,et al.  Transfer Learning via Learning to Transfer: Supplementary Material , 2018, ICML 2018.

[19]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[20]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[21]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[22]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Barbara Caputo,et al.  Learning Categories From Few Examples With Multi Model Knowledge Transfer , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Qiang Yang,et al.  Transfer Knowledge between Cities , 2016, KDD.

[25]  Rong Yan,et al.  Adapting SVM Classifiers to Data with Shifted Distributions , 2007 .

[26]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[27]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[28]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[29]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[30]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[31]  Jian Yang,et al.  Globally Maximizing, Locally Minimizing: Unsupervised Discriminant Projection with Applications to Face and Palm Biometrics , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Brian C. Lovell,et al.  Unsupervised Domain Adaptation by Domain Invariant Projection , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[34]  Yu Zhang,et al.  Personalizing a Dialogue System with Transfer Learning , 2016, ArXiv.

[35]  M. Cole,et al.  Cognitive Development: Its Cultural and Social Foundations , 1976 .

[36]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[37]  Jianmin Wang,et al.  Transfer Learning with Graph Co-Regularization , 2012, IEEE Transactions on Knowledge and Data Engineering.

[38]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[39]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.