Healing Sample Selection Bias by Source Classifier Selection

Domain Adaptation (DA) methods are usually carried out by means of simply reducing the marginal distribution differences between the source and target domains, and subsequently using the resultant trained classifier, namely source classifier, for use in the target domain. However, in many cases, the true predictive distributions of the source and target domains can be vastly different especially when their class distributions are skewed, causing the issues of sample selection bias in DA. Hence, DA methods which leverage the source labeled data may suffer from poor generalization in the target domain, resulting in negative transfer. In addition, we observed that many DA methods use either a source classifier or a linear combination of source classifiers with a fixed weighting for predicting the target unlabeled data. Essentially, the labels of the target unlabeled data are spanned by the prediction of these source classifiers. Motivated by these observations, in this paper, we propose to construct many source classifiers of diverse biases and learn the weight for each source classifier by directly minimizing the structural risk defined on the target unlabeled data so as to heal the possible sample selection bias. Since the weights are learned by maximizing the margin of separation between opposite classes on the target unlabeled data, the proposed method is established here as Maximal Margin Target Label Learning (MMTLL), which is in a form of Multiple Kernel Learning problem with many label kernels. Extensive experimental studies of MMTLL against several state-of-the-art methods on the Sentiment and Newsgroups datasets with various imbalanced class settings showed that MMTLL exhibited robust accuracies on all the settings considered and was resilient to negative transfer, in contrast to other counterpart methods which suffered significantly in prediction accuracy.

[1]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[2]  Ivor W. Tsang,et al.  Maximum Margin Clustering Made Practical , 2007, IEEE Transactions on Neural Networks.

[3]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[4]  Stephen P. Boyd,et al.  A minimax theorem with applications to machine learning, signal processing, and finance , 2007, 2007 46th IEEE Conference on Decision and Control.

[5]  Lorenzo Bruzzone,et al.  Domain Adaptation Problems: A DASVM Classification Technique and a Circular Validation Strategy , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[7]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  Sethuraman Panchanathan,et al.  Multi-source domain adaptation and its application to early detection of fatigue , 2011, KDD.

[9]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[10]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[11]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[12]  George Forman,et al.  Counting Positives Accurately Despite Inaccurate Classification , 2005, ECML.

[13]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[14]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[15]  Qiang Yang,et al.  Bridging Domains Using World Wide Knowledge for Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[16]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[17]  Ivor W. Tsang,et al.  Predictive Distribution Matching SVM for Multi-domain Learning , 2010, ECML/PKDD.

[18]  Thomas G. Dietterich,et al.  To transfer or not to transfer , 2005, NIPS 2005.

[19]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[20]  Ivor W. Tsang,et al.  Domain adaptation from multiple sources via auxiliary classifiers , 2009, ICML '09.

[21]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[22]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[23]  Steffen Bickel,et al.  Transfer Learning by Distribution Matching for Targeted Advertising , 2008, NIPS.

[24]  Philip S. Yu,et al.  Predictive Modeling with Heterogeneous Sources , 2010, SDM.

[25]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[26]  Ivor W. Tsang,et al.  Location and Scatter Matching for Dataset Shift in Text Mining , 2010, 2010 IEEE International Conference on Data Mining.

[27]  Ivor W. Tsang,et al.  Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets , 2010, ICML.

[28]  Gunnar Rätsch,et al.  An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis , 2008, NIPS.

[29]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[30]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[31]  Ivor W. Tsang,et al.  Tighter and Convex Maximum Margin Clustering , 2009, AISTATS.

[32]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[33]  Thomas G. Dietterich,et al.  Improving SVM accuracy by training on auxiliary data sources , 2004, ICML.

[34]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.