论文信息 - History Dependent Domain Adaptation

History Dependent Domain Adaptation

∑ x max(0,1−ht (x)wt−1 T x ) Average weights or model outputs (equivalent in the linear case). A linear combination of previous hypotheses gives us a simple baseline for comparison. Exponential averaging is extremely easy to implement. Reduce divergence from previous hypotheses by using a small step size, or by taking fewer steps. In general, we might use an online learning algorithm. w_{t+1} w_t r=δ Add a regularization term which penalizes the model for differing from the previous model. The hinge loss term is equivalent to adding extra weighted examples to the data set. Full optimization, with a hard constraint.

Nathan D. Ratliff | M. Otey | Allen Lavoie

[1] Mark Herbster,et al. Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[2] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[3] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[4] J. Andrew Bagnell,et al. Robust Supervised Learning , 2005, AAAI.

[5] Daniel Marcu,et al. Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[6] Lawrence K. Saul,et al. Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[7] D. Sculley,et al. Detecting adversarial advertisements in the wild , 2011, KDD.