Fast Reinforcement Learning for Energy-Efficient Wireless Communication

We consider the problem of energy-efficient point-to-point transmission of delay-sensitive data (e.g., multimedia data) over a fading channel. We propose a rigorous and unified framework for simultaneously utilizing both physical-layer and system-level techniques to minimize energy consumption, under delay constraints, in the presence of stochastic and unknown traffic and channel conditions. We formulate the problem as a Markov decision process and solve it online using reinforcement learning. The advantages of the proposed online method are that i) it does not require a priori knowledge of the traffic arrival and channel statistics to determine the jointly optimal physical-layer and system-level power management strategies; ii) it exploits partial information about the system so that less information needs to be learned than when using conventional reinforcement learning algorithms; and iii) it obviates the need for action exploration, which severely limits the adaptation speed and run-time performance of conventional reinforcement learning algorithms.

[1]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[2]  Vikram Krishnamurthy,et al.  Monotonicity of Constrained Optimal Transmission Policies in Correlated Fading Channels With ARQ , 2010, IEEE Transactions on Signal Processing.

[3]  Luca Benini,et al.  Policy optimization for dynamic power management , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[4]  Mihaela van der Schaar,et al.  Transmitting Important Bits and Sailing High Radio Waves: A Decentralized Cross-Layer Approach to Cooperative Video Transmission , 2011, IEEE Journal on Selected Areas in Communications.

[5]  E. Altman Constrained Markov Decision Processes , 1999 .

[6]  Christian Poellabauer,et al.  Energy-aware traffic shaping for wireless real-time applications , 2004, Proceedings. RTAS 2004. 10th IEEE Real-Time and Embedded Technology and Applications Symposium, 2004..

[7]  Tsuhan Chen,et al.  Hierarchical Modeling of Variable Bit Rate Video Sources , 2001 .

[8]  J.E. Mazo,et al.  Digital communications , 1985, Proceedings of the IEEE.

[9]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Abhijeet Bhorkar,et al.  An on-line learning algorithm for energy efficient delay constrained scheduling over a fading channel , 2008, IEEE Journal on Selected Areas in Communications.

[12]  Georgios B. Giannakis,et al.  Queuing with adaptive modulation and coding over wireless links: cross-Layer analysis and design , 2005, IEEE Transactions on Wireless Communications.

[13]  Luca Benini,et al.  Dynamic power management for nonstationary service requests , 1999, Design, Automation and Test in Europe Conference and Exhibition, 1999. Proceedings (Cat. No. PR00078).

[14]  Syed Aon Mujtaba,et al.  On the code-diversity performance of bit-interleaved coded OFDM in frequency-selective fading channels , 2003, 2003 IEEE 58th Vehicular Technology Conference. VTC 2003-Fall (IEEE Cat. No.03CH37484).

[15]  Vivek S. Borkar,et al.  A Stable Online Algorithm for Energy-Efficient Multiuser Scheduling , 2010, IEEE Transactions on Mobile Computing.

[16]  Mihaela van der Schaar,et al.  Structure-Aware Stochastic Control for Transmission Scheduling , 2010, IEEE Transactions on Vehicular Technology.

[17]  Vivek S. Borkar,et al.  An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..

[18]  Dimitri P. Bertsekas,et al.  Data Networks , 1986 .

[19]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[20]  Sean P. Meyn,et al.  The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..

[21]  Klara Nahrstedt,et al.  QoS Support in Multimedia Wireless Environments , 2006 .

[22]  Curt Schurgers Energy-aware Wireless Communications , 2002 .

[23]  Ashutosh Sabharwal,et al.  Delay-bounded packet scheduling of bursty traffic over wireless channels , 2004, IEEE Transactions on Information Theory.

[24]  Andrea J. Goldsmith,et al.  Adaptive coded modulation for fading channels , 1998, IEEE Trans. Commun..

[25]  Radu Marculescu,et al.  Hierarchical adaptive dynamic power management , 2004 .

[26]  Mihaela van der Schaar,et al.  Online Reinforcement Learning for Dynamic Multimedia Systems , 2010, IEEE Transactions on Image Processing.

[27]  Luca Benini,et al.  Dynamic Power Management for Nonstationary Service Requests , 2002, IEEE Trans. Computers.

[28]  Paramvir Bahl,et al.  Wake on wireless: an event driven energy saving strategy for battery operated devices , 2002, MobiCom '02.

[29]  Yishay Mansour,et al.  Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[30]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[31]  Vikram Krishnamurthy,et al.  MIMO Transmission Control in Fading Channels—A Constrained Markov Decision Process Formulation With Monotone Randomized Policies , 2007, IEEE Transactions on Signal Processing.