Largest Source Subset Selection for Instance Transfer

Instance-transfer learning has emerged as a promising learning framework to boost performance of prediction models on newly-arrived tasks. The success of the framework depends on the relevance of the source data to the target data. This paper proposes a new approach to source data selection for instance-transfer learning. The approach is capable of selecting the largest subset S of the source data which relevance to the target data is statistically guaranteed to be the highest among any superset ofS . The approach is formally described and theoretically justied. Experimental results on real-world data sets demonstrate that