Domain Adaptation in Regression

This paper presents a series of new results for domain adaptation in the regression setting. We prove that the discrepancy is a distance for the squared loss when the hypothesis set is the reproducing kernel Hilbert space induced by a universal kernel such as the Gaussian kernel. We give new pointwise loss guarantees based on the discrepancy of the empirical source and target distributions for the general class of kernel-based regularization algorithms. These bounds have a simpler form than previous results and hold for a broader class of convex loss functions not necessarily differentiable, including Lq losses and the hinge loss. We extend the discrepancy minimization adaptation algorithm to the more significant case where kernels are used and show that the problem can be cast as an SDP similar to the one in the feature space. We also show that techniques from smooth optimization can be used to derive an efficient algorithm for solving such SDPs even for very high-dimensional feature spaces. We have implemented this algorithm and report the results of experiments demonstrating its benefits for adaptation and show that, unlike previous algorithms, it can scale to large data sets of tens of thousands or more points.

[1]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[2]  Robert L. Mercer,et al.  Adaptive Language Modeling Using Minimum Discriminant Estimation , 1992, HLT.

[3]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[4]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[5]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[8]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[9]  Dudley,et al.  Real Analysis and Probability: Integration , 2002 .

[10]  Aleix M. Martínez,et al.  Recognizing Imprecisely Localized, Partially Occluded, and Expression Variant Faces from a Single Sample per Class , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[13]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[14]  Yurii Nesterov,et al.  Smoothing Technique and its Applications in Semidefinite Optimization , 2004, Math. Program..

[15]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[16]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[17]  Koby Crammer,et al.  Learning Bounds for Domain Adaptation , 2007, NIPS.

[18]  John Blitzer,et al.  Frustratingly Hard Domain Adaptation for Dependency Parsing , 2007, EMNLP.

[19]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[20]  Tyler Lu,et al.  Impossibility Theorems for Domain Adaptation , 2010, AISTATS.