Efficient algorithms for online decision problems

In an online decision problem, one makes a sequence of decisions without knowledge of the future. Tools from learning such as Weighted Majority and its many variants [4,13,18] demonstrate that online algorithms can perform nearly as well as the best single decision chosen in hindsight, even when there are exponentially many possible decisions. However, the naive application of these algorithms is inefficient for such large problems. For some problems with nice structure, specialized efficient solutions have been developed [3, 6,10,16,17]. We show that a very simple idea, used in Hannan's seminal 1957 paper [9], gives efficient solutions to all of these problems. Essentially, in each period, one chooses the decision that worked best in the past. To guarantee low regret, it is necessary to add randomness. Surprisingly, this simple approach gives additive e regret per period, efficiently. We present a simple general analysis and several extensions, including a (1+∈)-competitive algorithm as well as a lazy one that rarely switches between decisions.

[1]  Robert E. Tarjan,et al.  Self-adjusting binary search trees , 1985, JACM.

[2]  Donald E. Knuth,et al.  Dynamic Huffman Coding , 1985, J. Algorithms.

[3]  Robert E. Tarjan,et al.  Amortized efficiency of list update and paging rules , 1985, CACM.

[4]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[5]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[6]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[7]  Robert E. Schapire,et al.  Predicting Nearly As Well As the Best Pruning of a Decision Tree , 1995, COLT '95.

[8]  Nathan Linial,et al.  The geometry of graphs and some of its algorithmic applications , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[9]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[10]  T. Cover Universal Portfolios , 1996 .

[11]  Avrim Blum,et al.  On-line Algorithms in Machine Learning , 1996, Online Algorithms.

[12]  Sanjeev Arora,et al.  Polynomial time approximation schemes for Euclidean TSP and other geometric problems , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[13]  Manfred K. Warmuth,et al.  How to use expert advice , 1997, JACM.

[14]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[15]  Avrim Blum,et al.  On-line Learning and the Metrical Task System Problem , 1997, COLT '97.

[16]  Satish Rao,et al.  Small distortion and volume preserving embeddings for planar and Euclidean metrics , 1999, SCG '99.

[17]  Manfred K. Warmuth,et al.  Predicting nearly as well as the best pruning of a planar decision graph , 2002, Theor. Comput. Sci..

[18]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[19]  Uriel Feige,et al.  Approximating the Bandwidth via Volume Respecting Embeddings , 2000, J. Comput. Syst. Sci..

[20]  Santosh S. Vempala,et al.  On Euclidean Embeddings and Bandwidth Minimization , 2001, RANDOM-APPROX.

[21]  Adam Tauman Kalai,et al.  Static Optimality and Dynamic Search-Optimality in Lists and Trees , 2002, SODA '02.

[22]  Éva Tardos,et al.  Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields , 2002, JACM.

[23]  Adam Tauman Kalai,et al.  Geometric algorithms for online optimization , 2002 .

[24]  Manfred K. Warmuth,et al.  Path Kernels and Multiplicative Updates , 2002, J. Mach. Learn. Res..

[25]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[26]  Baruch Awerbuch,et al.  Adapting to a reliable network path , 2003, PODC '03.

[27]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[28]  Baruch Awerbuch,et al.  Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.

[29]  Avrim Blum,et al.  Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.