Regret Minimization for Online Buffering Problems Using the Weighted Majority Algorithm

Suppose a decision maker has to purchase a commodity over time with varying prices and demands. In particular, the price per unit might depend on the amount purchased and this price function might vary from step to step. The decision maker has a buffer of bounded size for storing units of the commodity that can be used to satisfy demands at later points in time. We seek for an algorithm deciding at which time to buy which amount of the commodity so as to minimize the cost. This kind of problem arises in many technological and economical settings like, e.g., battery management in hybrid cars and economical caching policies for mobile devices. A simplified but illustrative example is a frugal car driver thinking about at which occasion to buy which amount of gasoline. Within a regret analysis, we assume that the decision maker can observe the performance of a set of expert strategies over time and synthesizes the observed strategies into a new online algorithm. In particular, we investigate the external regret obtained by the well-known Randomized Weighted Majority algorithm applied to our problem. We show that this algorithm does not achieve a reasonable regret bound if its random choices are independent from step to step, that is, the regret for T steps is Ω(T ). However, one can achieve regret O( √ T ) when introducing dependencies in order to reduce the number of changes between the chosen experts. If the price functions satisfy a convexity condition then one can even derive a deterministic variant of this algorithm achieving regret O( √ T ). Our more detailed bounds on the regret depend on the buffer size and the number of available experts. The upper bounds are complemented by a matching lower bound on the best possible external regret. ∗Supported by the DFG GK/1298 “AlgoSyn” and UMIC Research Center

[1]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[2]  Allan Borodin,et al.  Online computation and competitive analysis , 1998 .

[3]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[4]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[5]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[6]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[7]  Avrim Blum,et al.  On-line Algorithms in Machine Learning , 1996, Online Algorithms.

[8]  Bo Egardt,et al.  Assessing the Potential of Predictive Control for Hybrid Vehicle Powertrains Using Stochastic Dynamic Programming , 2005, IEEE Transactions on Intelligent Transportation Systems.

[9]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[10]  Mehrdad Ehsani,et al.  Application of electrically peaking hybrid (ELPH) propulsion system to a full-size passenger car with simulated design verification , 1999 .

[11]  Robert E. Tarjan,et al.  Amortized efficiency of list update and paging rules , 1985, CACM.

[12]  Berthold Vöcking,et al.  Economical Caching , 2009, STACS.

[13]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[14]  Berthold Vöcking,et al.  Economical Caching with Stochastic Prices , 2009, SAGA.

[15]  Guo-Ping Liu,et al.  Optimal fuzzy power control and management of fuel cell/battery hybrid vehicles , 2009 .

[16]  Ran El-Yaniv,et al.  Optimal Search and One-Way Trading Online Algorithms , 2001, Algorithmica.

[17]  Y. Mansour,et al.  Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .