On the Usefulness of Similarity Based Projection Spaces for Transfer Learning

Similarity functions are widely used in many machine learning or pattern recognition tasks. We consider here a recent framework for binary classification, proposed by Balcan et al., allowing to learn in a potentially non geometrical space based on good similarity functions. This framework is a generalization of the notion of kernels used in support vector machines in the sense that allows one to use similarity functions that do not need to be positive semi-definite nor symmetric. The similarities are then used to define an explicit projection space where a linear classifier with good generalization properties can be learned. In this paper, we propose to study experimentally the usefulness of similarity based projection spaces for transfer learning issues. More precisely, we consider the problem of domain adaptation where the distributions generating learning data and test data are somewhat different. We stand in the case where no information on the test labels is available. We show that a simple renormalization of a good similarity function taking into account the test data allows us to learn classifiers more performing on the target distribution for difficult adaptation problems. Moreover, this normalization always helps to improve the model when we try to regularize the similarity based projection space in order to move closer the two distributions. We provide experiments on a toy problem and on a real image annotation task.

[1]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[2]  Xuelong Li,et al.  A survey of graph edit distance , 2010, Pattern Analysis and Applications.

[3]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[4]  Tyler Lu,et al.  Impossibility Theorems for Domain Adaptation , 2010, AISTATS.

[5]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[6]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[7]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[8]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[9]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[10]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[11]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[12]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[13]  Marc Sebban,et al.  Learning probabilistic models of tree edit distance , 2008, Pattern Recognit..

[14]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[15]  Maria-Florina Balcan,et al.  On a theory of learning with similarity functions , 2006, ICML.

[16]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[17]  James J. Jiang A Literature Survey on Domain Adaptation of Statistical Classifiers , 2007 .

[18]  Lorenzo Bruzzone,et al.  Domain Adaptation Problems: A DASVM Classification Technique and a Circular Validation Strategy , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Qiang Yang,et al.  Cross Validation Framework to Choose amongst Models and Datasets for Transfer Learning , 2010, ECML/PKDD.

[20]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[21]  Ajay Divakaran Multimedia Content Analysis: Theory and Applications , 2008 .

[22]  Maria-Florina Balcan,et al.  Improved Guarantees for Learning via Similarity Functions , 2008, COLT.

[23]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[25]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[26]  Bernard Haasdonk,et al.  Feature space interpretation of SVMs with indefinite kernels , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[28]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[29]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[30]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[31]  Stéphane Ayache,et al.  Image and Video Indexing Using Networks of Operators , 2007, EURASIP J. Image Video Process..

[32]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[33]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[34]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.