Approximate dynamic programming using fluid and diffusion approximations with applications to power management

TD learning and its refinements are powerful tools for approximating the solution to dynamic programming problems. However, the techniques provide the approximate solution only within a prescribed finite-dimensional function class. Thus, the question that always arises is how should the function class be chosen? The goal of this paper is to propose an approach for TD learning based on choosing the function class using the solutions to associated fluid and diffusion approximations. In order to illustrate this new approach, the paper focuses on an application to dynamic speed scaling for power management.

[1]  A. Wierman,et al.  Optimality, fairness, and robustness in speed scaling designs , 2010, SIGMETRICS '10.

[2]  Vijay R. Konda,et al.  OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[3]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[4]  Vivek S. Borkar,et al.  Convex Analytic Methods in Markov Decision Processes , 2002 .

[5]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[6]  Panganamala Ramana Kumar,et al.  A network information theory for wireless communication: scaling laws and optimal operation , 2004, IEEE Transactions on Information Theory.

[7]  Kirk Pruhs,et al.  Getting the Best Response for Your Erg , 2004, SWAT.

[8]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[9]  Sean P. Meyn Workload models for stochastic networks: value functions and performance evaluation , 2005, IEEE Transactions on Automatic Control.

[10]  Lachlan L. H. Andrew,et al.  Power-Aware Speed Scaling in Processor Sharing Systems , 2009, IEEE INFOCOM 2009.

[11]  J. Tschanz,et al.  Ultra-low voltage circuits and processor in 180nm to 90nm technologies with a swapped-body biasing technique , 2004, 2004 IEEE International Solid-State Circuits Conference (IEEE Cat. No.04CH37519).

[12]  Sean P. Meyn The policy iteration algorithm for average reward Markov decision processes with general state space , 1997, IEEE Trans. Autom. Control..

[13]  D. White,et al.  Dynamic programming, Markov chains, and the method of successive approximations , 1963 .

[14]  Sunil Kumar,et al.  Decision , Risk & Operations Working Papers Series Approximate and Data-Driven Dynamic Programming for Queueing Networks , 2008 .

[15]  Susanne Albers,et al.  Energy-efficient algorithms for flow time minimization , 2006, STACS.

[16]  A. Hoorfar,et al.  INEQUALITIES ON THE LAMBERTW FUNCTION AND HYPERPOWER FUNCTION , 2008 .

[17]  Sean P. Meyn,et al.  Performance Evaluation and Policy Selection in Multiclass Networks , 2003, Discret. Event Dyn. Syst..

[18]  F. Frances Yao,et al.  A scheduling model for reduced CPU energy , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[19]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[20]  Frank Mueller,et al.  Feedback EDF Scheduling of Real-Time Tasks Exploiting Dynamic Voltage Scaling , 2005, Real-Time Systems.

[21]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[22]  Daniel O'Neill,et al.  Power Aware Management of Packet Switches , 2007 .

[23]  J. W. Nieuwenhuis,et al.  Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Kirk Pruhs,et al.  Speed scaling to manage energy and temperature , 2007, JACM.

[26]  Karam S. Chatha,et al.  Approximation algorithm for the temperature-aware scheduling problem , 2007, 2007 IEEE/ACM International Conference on Computer-Aided Design.

[27]  Diana Marculescu,et al.  Analysis of dynamic voltage/frequency scaling in chip-multiprocessors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[28]  Kirk Pruhs,et al.  Speed Scaling of Tasks with Precedence Constraints , 2005, WAOA.

[29]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[30]  Margaret Martonosi,et al.  Computer Architecture Techniques for Power-Efficiency , 2008, Computer Architecture Techniques for Power-Efficiency.

[31]  Kirk Pruhs,et al.  Speed scaling for weighted flow time , 2007, SODA '07.

[32]  Sean P. Meyn Stability and optimization of queueing networks and their fluid models , 2003 .

[33]  Karam S. Chatha,et al.  Approximation algorithm for the temperature-aware scheduling problem , 2007, ICCAD 2007.

[34]  Susanne Albers,et al.  Energy-Efficient Algorithms for Flow Time Minimization , 2006, STACS.

[35]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[36]  Sean P. Meyn,et al.  Value iteration and optimization of multiclass queueing networks , 1999, Queueing Syst. Theory Appl..

[37]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[38]  David P. Bunde Power-aware scheduling for makespan and flow , 2006, SPAA '06.

[39]  Shie Mannor,et al.  Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[40]  M. Veatch Approximate Dynamic Programming for Networks : Fluid Models and Constraint Reduction , 2004 .

[41]  J. Michael Harrison,et al.  Dynamic Control of a Queue with Adjustable Service Rate , 2001, Oper. Res..

[42]  Sean P. Meyn,et al.  Optimal cross-layer wireless control policies using TD learning , 2010, 49th IEEE Conference on Decision and Control (CDC).

[43]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[44]  Benjamin Van Roy,et al.  A Cost-Shaping Linear Program for Average-Cost Approximate Dynamic Programming with Performance Guarantees , 2006, Math. Oper. Res..

[45]  Paramvir Bahl,et al.  A case for adapting channel width in wireless networks , 2008, SIGCOMM '08.

[46]  Kirk Pruhs,et al.  Speed Scaling of Tasks with Precedence Constraints , 2005, Theory of Computing Systems.

[47]  BansalNikhil,et al.  Speed scaling to manage energy and temperature , 2007 .

[48]  Sean P. Meyn Algorithms for optimization and stabilization of controlled Markov chains , 1999 .

[49]  Sean P. Meyn Control Techniques for Complex Networks: Workload , 2007 .