History Dependent Domain Adaptation

∑ x max(0,1−ht (x)wt−1 T x ) Average weights or model outputs (equivalent in the linear case). A linear combination of previous hypotheses gives us a simple baseline for comparison. Exponential averaging is extremely easy to implement. Reduce divergence from previous hypotheses by using a small step size, or by taking fewer steps. In general, we might use an online learning algorithm. w_{t+1} w_t r=δ Add a regularization term which penalizes the model for differing from the previous model. The hinge loss term is equivalent to adding extra weighted examples to the data set. Full optimization, with a hard constraint.