Parsimonious unsupervised and semi-supervised domain adaptation with good similarity functions

In this paper, we address the problem of domain adaptation for binary classification. This problem arises when the distributions generating the source learning data and target test data are somewhat different. From a theoretical standpoint, a classifier has better generalization guarantees when the two domain marginal distributions of the input space are close. Classical approaches try mainly to build new projection spaces or to reweight the source data with the objective of moving closer the two distributions. We study an original direction based on a recent framework introduced by Balcan et al. enabling one to learn linear classifiers in an explicit projection space based on a similarity function, not necessarily symmetric nor positive semi-definite. We propose a well-founded general method for learning a low-error classifier on target data, which is effective with the help of an iterative procedure compatible with Balcan et al.’s framework. A reweighting scheme of the similarity function is then introduced in order to move closer the distributions in a new projection space. The hyperparameters and the reweighting quality are controlled by a reverse validation procedure. Our approach is based on a linear programming formulation and shows good adaptation performances with very sparse models. We first consider the challenging unsupervised case where no target label is accessible, which can be helpful when no manual annotation is possible. We also propose a generalization to the semi-supervised case allowing us to consider some few target labels when available. Finally, we evaluate our method on a synthetic problem and on a real image annotation task.

[1]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[2]  Stéphane Ayache,et al.  Image and Video Indexing Using Networks of Operators , 2007, EURASIP J. Image Video Process..

[3]  James J. Jiang A Literature Survey on Domain Adaptation of Statistical Classifiers , 2007 .

[4]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[7]  Ivor W. Tsang,et al.  Predictive Distribution Matching SVM for Multi-domain Learning , 2010, ECML/PKDD.

[8]  Lorenzo Torresani,et al.  Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach , 2010, NIPS.

[9]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[10]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[11]  Maria-Florina Balcan,et al.  A theory of learning with similarity functions , 2008, Machine Learning.

[12]  Tyler Lu,et al.  Impossibility Theorems for Domain Adaptation , 2010, AISTATS.

[13]  Dit-Yan Yeung,et al.  Transfer metric learning by learning task relationships , 2010, KDD.

[14]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[15]  Yan Liu,et al.  Learning with Minimum Supervision: A General Framework for Transductive Transfer Learning , 2011, 2011 IEEE 11th International Conference on Data Mining.

[16]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[17]  Bo Geng,et al.  DAML: Domain Adaptation Metric Learning , 2011, IEEE Transactions on Image Processing.

[18]  Gunnar Rätsch,et al.  An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis , 2008, NIPS.

[19]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[20]  Robert M. Freund,et al.  Polynomial-time algorithms for linear programming based only on primal scaling and projected gradients of a potential function , 1991, Math. Program..

[21]  Mandava Rajeswari,et al.  A survey of the state of the art in learning the kernels , 2012, Knowledge and Information Systems.

[22]  Wei Fan,et al.  Query-dependent cross-domain ranking in heterogeneous network , 2011, Knowledge and Information Systems.

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[25]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[26]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[27]  Ivor W. Tsang,et al.  Domain adaptation from multiple sources via auxiliary classifiers , 2009, ICML '09.

[28]  Mehryar Mohri,et al.  Domain Adaptation in Regression , 2011, ALT.

[29]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[30]  Lorenzo Bruzzone,et al.  Domain Adaptation Problems: A DASVM Classification Technique and a Circular Validation Strategy , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Sethuraman Panchanathan,et al.  Multi-source domain adaptation and its application to early detection of fatigue , 2011, KDD.

[32]  Kristian Kersting,et al.  Multi-task Learning with Task Relations , 2011, 2011 IEEE 11th International Conference on Data Mining.

[33]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[34]  Asim Karim,et al.  Robust personalizable spam filtering via local and global discrimination modeling , 2012, Knowledge and Information Systems.

[35]  Qiang Yang,et al.  Distance Metric Learning under Covariate Shift , 2011, IJCAI.

[36]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[37]  Maria-Florina Balcan,et al.  On a theory of learning with similarity functions , 2006, ICML.

[38]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[39]  John Blitzer,et al.  Domain Adaptation with Coupled Subspaces , 2011, AISTATS.

[40]  Qiang Yang,et al.  Cross Validation Framework to Choose amongst Models and Datasets for Transfer Learning , 2010, ECML/PKDD.

[41]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[42]  Maria-Florina Balcan,et al.  Improved Guarantees for Learning via Similarity Functions , 2008, COLT.

[43]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[44]  Y. Ye An O(n^3L) potential reduction algorithm for linear programming , 1991 .

[45]  Avishek Saha,et al.  Co-regularization Based Semi-supervised Domain Adaptation , 2010, NIPS.

[46]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[47]  Marc Sebban,et al.  Learning Good Edit Similarities with Generalization Guarantees , 2011, ECML/PKDD.

[48]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[49]  Virgílio A. F. Almeida,et al.  From bias to opinion: a transfer-learning approach to real-time sentiment analysis , 2011, KDD.

[50]  Yinyu Ye,et al.  An O(n3L) potential reduction algorithm for linear programming , 1991, Math. Program..

[51]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[52]  John Blitzer,et al.  Co-Training for Domain Adaptation , 2011, NIPS.

[53]  Hongliang Fei,et al.  Structured Feature Selection and Task Relationship Inference for Multi-task Learning , 2011, ICDM.