Online fractional programming for Markov decision systems

We consider a system with K states which operates over frames with different lengths. Every frame, the controller observes a new random event and then chooses a control action based on this observation. The current state, random event, and control action together affect: (i) the frame size, (ii) a vector of penalties incurred over the frame, and (iii) the transition probabilities to the next state visited at the end of the frame. The goal is to minimize the time average of one penalty subject to time average constraints on the others. This problem has applications to task scheduling in computer systems and wireless networks, where each task can take a different amount of time and may change the state of the network. An example is energy-optimal scheduling in a system with several energy-saving transmission modes, where transitions to a different mode incur energy and/or delay penalties. We pose the problem as a stochastic linear fractional program and present an online Lyapunov drift method for solving it. For large classes of problems, the solution can be implemented without any knowledge of the random event probabilities.

[1]  Vikram Krishnamurthy,et al.  ${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control , 2007, IEEE Transactions on Signal Processing.

[2]  Mihaela van der Schaar,et al.  A systematic framework for dynamically optimizing multi-user wireless video transmission , 2009, IEEE Journal on Selected Areas in Communications.

[3]  Vivek S. Borkar,et al.  An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..

[4]  Mihaela van der Schaar,et al.  Decomposition Principles and Online Learning in Cross-Layer Optimization for Delay-Sensitive Applications , 2008, IEEE Transactions on Signal Processing.

[5]  Sheldon M. Ross,et al.  Introduction to probability models , 1975 .

[6]  Thomas F. La Porta,et al.  Max Weight Learning Algorithms for Scheduling in Unknown Environments , 2012, IEEE Transactions on Automatic Control.

[7]  F. Vázquez-Abad,et al.  Policy gradient stochastic approximation algorithms for adaptive control of constrained time varying Markov decision processes , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[8]  M. Neely Queue Stability and Probability 1 Convergence via Lyapunov Optimization , 2010, 1008.3519.

[9]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[10]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[11]  Michael J. Neely,et al.  Stochastic optimization for Markov modulated networks with application to delay constrained wireless scheduling , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[12]  Abhijeet Bhorkar,et al.  An on-line learning algorithm for energy efficient delay constrained scheduling over a fading channel , 2008, IEEE Journal on Selected Areas in Communications.

[13]  Leandros Tassiulas,et al.  Resource Allocation and Cross-Layer Control in Wireless Networks , 2006, Found. Trends Netw..

[14]  E. Altman Constrained Markov Decision Processes , 1999 .