Active Transfer Learning

A major assumption in data mining and machine learning is that the training set and test set come from the same domain. They share the same feature space and have the same distribution. However, in many real-world applications, the training set and test set usually come from different domains. Thus, there might be negative similarities between different domains so that the negative transfer problem caused by negative similarity may happen. In this paper, we propose a novel method named active transfer learning (ATL) to solve the above problem. Specifically, the orthogonal projection matrix and the weight coefficient vector are introduced to extend maximum mean discrepancy (MMD) so that it can minimize MMD and simultaneously eliminate the negative transfer. To find the informative and discriminative subsets from the source domain, we then propose an information diversity term by using the local geometric structure information of the source samples. Besides, by using the label information of source samples, our method can guarantee the selected subsets as discriminative as possible. Finally, to efficiently implement the proposed method, an alternating optimization approach, which is based on the alternating direction method of multipliers (ADMM), is designed to solve the optimization problem. To demonstrate the effectiveness of the proposed ATL model, experiments are conducted on five real-world data sets. The experimental results show the superiority of our method over the state-of-the-art methods.

[1]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[2]  Donald E. Knuth Two notes on notation , 1992 .

[3]  N. Hjort,et al.  Nonparametric Density Estimation with a Parametric Start , 1995 .

[4]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[5]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[6]  Klaus Obermayer,et al.  Gaussian process regression: active data selection and test point rejection , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[7]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[8]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[9]  J. De Gooijer,et al.  On Conditional Density Estimation , 2003 .

[10]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[11]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[13]  Anton Schwaighofer,et al.  Learning Gaussian Process Kernels via Hierarchical Bayes , 2004, NIPS.

[14]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[15]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[16]  Lawrence Carin,et al.  Logistic regression with an auxiliary data source , 2005, ICML.

[17]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[18]  Yiming Yang,et al.  Learning Multiple Related Tasks using Latent Independent Component Analysis , 2005, NIPS.

[19]  Andreas Krause,et al.  Near-optimal sensor placements in Gaussian processes , 2005, ICML.

[20]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[21]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Andrew Y. Ng,et al.  Transfer learning for text classification , 2005, NIPS.

[23]  Thomas G. Dietterich,et al.  To transfer or not to transfer , 2005, NIPS 2005.

[24]  Rajat Raina,et al.  Constructing informative priors using transfer learning , 2006, ICML.

[25]  Koby Crammer,et al.  Learning from Multiple Sources , 2006, NIPS.

[26]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[27]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[28]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[29]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[30]  Raymond J. Mooney,et al.  Mapping and Revising Markov Logic Networks for Transfer Learning , 2007, AAAI.

[31]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[32]  Alexander G. Gray,et al.  Fast Nonparametric Conditional Density Estimation , 2007, UAI.

[33]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[34]  Jiebo Luo,et al.  Kodak consumer video benchmark data set : concept definition and annotation * * , 2008 .

[35]  Volker Tresp,et al.  Robust multi-task learning with t-processes , 2007, ICML '07.

[36]  Charles A. Micchelli,et al.  A Spectral Regularization Framework for Multi-Task Structure Learning , 2007, NIPS.

[37]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[39]  Andreas Krause,et al.  Nonmyopic active learning of Gaussian processes: an exploration-exploitation approach , 2007, ICML '07.

[40]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[41]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[42]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[43]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[44]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[45]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[46]  Mehryar Mohri,et al.  Stability of transductive regression algorithms , 2008, ICML '08.

[47]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[48]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[49]  Qiang Yang,et al.  Translated Learning: Transfer Learning across Different Feature Spaces , 2008, NIPS.

[50]  Wei Fan,et al.  Actively Transfer Domain Knowledge , 2008, ECML/PKDD.

[51]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[52]  Jieping Ye,et al.  A convex formulation for learning shared structures from multiple tasks , 2009, ICML '09.

[53]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[54]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[55]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[56]  Cristian Sminchisescu,et al.  Efficient Match Kernel between Sets of Features for Visual Recognition , 2009, NIPS.

[57]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[58]  Dhanesh Ramachandram,et al.  Optimizing Kernel Functions Using Transfer Learning from Unlabeled Data , 2009, 2009 Second International Conference on Machine Vision.

[59]  Daumé,et al.  Domain Adaptation meets Active Learning , 2010, HLT-NAACL 2010.

[60]  Dacheng Tao,et al.  Bregman Divergence-Based Regularization for Transfer Subspace Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[61]  Jeff G. Schneider,et al.  Learning Multiple Tasks with a Sparse Matrix-Normal Penalty , 2010, NIPS.

[62]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[63]  Yi Yao,et al.  Boosting for transfer learning with multiple sources , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[64]  Ivor W. Tsang,et al.  Visual Event Recognition in Videos by Learning from Web Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Avishek Saha,et al.  Active Supervised Domain Adaptation , 2011, ECML/PKDD.

[66]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[67]  Jiayu Zhou,et al.  A multi-task learning formulation for predicting disease progression , 2011, KDD.

[68]  Sethuraman Panchanathan,et al.  A Two-Stage Weighting Framework for Multi-Source Domain Adaptation , 2011, NIPS.

[69]  Sethuraman Panchanathan,et al.  Multi-source domain adaptation and its application to early detection of fatigue , 2011, KDD.

[70]  Jiawei Han,et al.  A Variance Minimization Criterion to Active Learning on Graphs , 2012, AISTATS.

[71]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[72]  Francis R. Bach,et al.  Multi-task regression using minimal penalties , 2011, J. Mach. Learn. Res..

[73]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  Roman Garnett,et al.  Bayesian Optimal Active Search and Surveying , 2012, ICML.

[75]  Rich Caruana,et al.  Inductive Transfer for Bayesian Network Structure Learning , 2007, ICML Unsupervised and Transfer Learning.

[76]  Sanjiv Singh,et al.  Modeling and Calibrating Visual Yield Estimates in Vineyards , 2012, FSR.

[77]  Leon Wenliang Zhong,et al.  Convex Multitask Learning with Flexible Task Clusters , 2012, ICML.

[78]  Sethuraman Panchanathan,et al.  Joint Transfer and Batch-mode Active Learning , 2013, ICML.

[79]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[80]  Feiping Nie,et al.  Early Active Learning via Robust Representation and Structured Sparsity , 2013, IJCAI.

[81]  Massimiliano Pontil,et al.  Multilinear Multitask Learning , 2013, ICML.

[82]  Massimiliano Pontil,et al.  Excess risk bounds for multitask learning with trace norm regularization , 2012, COLT.

[83]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[84]  Barnabás Póczos,et al.  Distribution-Free Distribution Regression , 2013, AISTATS.

[85]  Barnabás Póczos,et al.  Distribution to Distribution Regression , 2013, ICML.

[86]  Philip S. Yu,et al.  Transfer Feature Learning with Joint Distribution Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[87]  Philip S. Yu,et al.  Transfer across Completely Different Feature Spaces via Spectral Embedding , 2013, IEEE Transactions on Knowledge and Data Engineering.

[88]  Roman Garnett,et al.  Active search on graphs , 2013, KDD.

[89]  Julien Audiffren,et al.  Stability of Multi-Task Kernel Regression Algorithms , 2013, ACML.

[90]  Philip S. Yu,et al.  Transfer Sparse Coding for Robust Image Representation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[91]  Howie Choset,et al.  Expensive Function Optimization with Stochastic Binary Outcomes , 2013, ICML.

[92]  Philip S. Yu,et al.  Adaptation Regularization: A General Framework for Transfer Learning , 2014, IEEE Transactions on Knowledge and Data Engineering.

[93]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[94]  Philip S. Yu,et al.  Transfer Joint Matching for Unsupervised Domain Adaptation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[95]  Ming Shao,et al.  Generalized Transfer Subspace Learning Through Low-Rank Constraint , 2014, International Journal of Computer Vision.

[96]  Jeff G. Schneider,et al.  Flexible Transfer Learning under Support and Model Shift , 2014, NIPS.

[97]  Jeff G. Schneider,et al.  Active Transfer Learning under Model Shift , 2014, ICML.

[98]  Han Li,et al.  Inferring air pollution by sniffing social media , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[99]  Barnabás Póczos,et al.  Fast Distribution To Real Regression , 2013, AISTATS.

[100]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Yu Zhang,et al.  Multi-Task Learning and Algorithmic Stability , 2015, AAAI.

[102]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[103]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[104]  Jeff G. Schneider,et al.  Generalization Bounds for Transfer Learning under Model Shift , 2015, UAI.

[105]  Yusen Zhan,et al.  Online Transfer Learning in Reinforcement Learning Domains , 2015, AAAI Fall Symposia.

[106]  Ling Shao,et al.  Transfer Learning for Visual Categorization: A Survey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[107]  Zhihui Lai,et al.  The L2, 1-norm-based unsupervised optimal feature selection with applications to action recognition , 2016, Pattern Recognit..

[108]  Manali Sharma,et al.  Evidence-based uncertainty sampling for active learning , 2016, Data Mining and Knowledge Discovery.

[109]  Barnabás Póczos,et al.  Nonparametric Risk and Stability Analysis for Multi-Task Learning Problems , 2016, IJCAI.

[110]  Adrian E. Raftery,et al.  Bayesian Model Averaging: A Tutorial , 2016 .

[111]  Mohan S. Kankanhalli,et al.  Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[112]  Lincheng Shen,et al.  Transfer learning via linear multi-variable mapping under reinforcement learning framework , 2017, 2017 36th Chinese Control Conference (CCC).

[113]  Wai Keung Wong,et al.  Low-Rank Embedding for Robust Image Feature Extraction , 2017, IEEE Transactions on Image Processing.

[114]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[115]  Xuelong Li,et al.  Regularized Label Relaxation Linear Regression , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[116]  Wai Keung Wong,et al.  Robust Latent Subspace Learning for Image Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.