A simple multi-armed bandit algorithm with optimal variation-bounded regret

We pose the question of whether it is possible to design a simple, linear-time algorithm for the basic multi-armed bandit problem in the adversarial setting which has a regret bound of O( p Q logT ), where Q is the total quadratic variation of all the arms.