Conformity-based source subset selection for instance transfer

Abstract Instance transfer aims at improving prediction models for a target domain by transferring data from related source domains. The effectiveness of instance transfer depends on the relevance of source data to the target domain. When the relevance of source data is limited, the only option is to select a subset of source data of which the relevance is acceptable. In this paper, we introduce three algorithms that perform source-subset selection prior to model training. The algorithms employ a conformity-based test that estimates the source-subset relevance based on individual instances or on subsets as a whole. Experiments conducted on four real-world data sets demonstrated the effectiveness of the proposed algorithms. Especially, it was shown that pre-training subset-selection based on set relevance is capable of outperforming the existing instance-transfer techniques.

[1]  A. Church On the concept of a random sequence , 1940 .

[2]  D. Aldous Exchangeability and related topics , 1985 .

[3]  Per Martin-Löf,et al.  The Definition of Random Sequences , 1966, Inf. Control..

[4]  Vladimir Vovk,et al.  A tutorial on conformal prediction , 2007, J. Mach. Learn. Res..

[5]  Chandan K. Reddy,et al.  Adaptive Boosting for Transfer Learning Using Dynamic Updates , 2011, ECML/PKDD.

[6]  Vladimir Vovk,et al.  Conditional validity of inductive conformal predictors , 2012, Machine Learning.

[7]  Jian Zhang,et al.  Double-bootstrapping source data selection for instance-based transfer learning , 2013, Pattern Recognit. Lett..

[8]  Ralf Peeters,et al.  A Non-parametric Conformity-Based Test for Transfer Decisions , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  G. Shafer,et al.  Algorithmic Learning in a Random World , 2005 .

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  E. Lehmann,et al.  Nonparametrics: Statistical Methods Based on Ranks , 1976 .

[13]  Shan Suthaharan,et al.  Support Vector Machine , 2016 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[16]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[17]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[18]  Shotaro Akaho,et al.  TrBagg: A Simple Transfer Learning Method and its Application to Personalization in Collaborative Tagging , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[19]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.