Kernel Mean Matching with a Large Margin

Various instance weighting methods have been proposed for instance-based transfer learning. Kernel Mean Matching (KMM) is one of the typical instance weighting approaches which estimates the instance importance by matching the two distributions in the universal reproducing kernel Hilbert space (RKHS). However, KMM is an unsupervised learning approach which does not utilize the class label knowledge of the source data. In this paper, we extended KMM by leveraging the class label knowledge and integrated KMM and SVM into an unified optimization framework called KMM-LM (Large Margin). The objective of KMM-LM is to maximize the geometric soft margin, and minimize the empirical classification error together with the domain discrepancy based on KMM simultaneously. KMM-LM utilizes an iterative minimization algorithm to find the optimal weight vector of the classification decision hyperplane and the importance weight vector of the instances in the source domain. The experiments show that KMM-LM outperforms the state-of-the-art baselines.

[1]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[2]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[3]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[4]  Chao Liu,et al.  Recommender systems with social regularization , 2011, WSDM '11.

[5]  Wei Gao,et al.  Learning to rank only using training data from related domain , 2010, SIGIR.

[6]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[7]  Sethuraman Panchanathan,et al.  A Two-Stage Weighting Framework for Multi-Source Domain Adaptation , 2011, NIPS.

[8]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  John Blitzer,et al.  Domain Adaptation with Coupled Subspaces , 2011, AISTATS.

[10]  Takafumi Kanamori,et al.  A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..

[11]  Qiang Yang,et al.  Transfer Learning in Collaborative Filtering for Sparsity Reduction , 2010, AAAI.

[12]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[13]  David Madigan,et al.  Constructing informative prior distributions from domain knowledge in text classification , 2006, SIGIR.

[14]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[15]  Takafumi Kanamori,et al.  Statistical analysis of kernel-based least-squares density-ratio estimation , 2012, Machine Learning.

[16]  Miroslav Kubat,et al.  Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study , 2007, IEEE Transactions on Knowledge and Data Engineering.

[17]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[18]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[19]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[20]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[21]  Masashi Sugiyama,et al.  Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation , 2008, SDM.