Resisting Dynamic Strategies in Gradually Evolving Worlds

We study the online linear optimization problem, in which a player has to make repeated online decisions with linear loss functions and hopes to achieve a small regret. We consider a natural restriction of this problem in which the loss functions have a small deviation, measured by the sum of the distances between every two consecutive loss functions. At the same time, we also consider a natural generalization, in which the regret is measured against a dynamic offline algorithm which can play different strategies in different rounds, but under the constraint that their deviation is small. We show that in this new setting, an online algorithm modified from the gradient descent algorithm can still achieve a small regret, which can be characterized in terms of the deviation of loss functions and the deviation of the offline algorithm. For the closely related online decision problem, we show that an online algorithm modified from the Hedge algorithm can also achieve a small regret in this new setting.