论文信息 - Fast Reinforcement Learning for Energy-Efficient Wireless Communication

Fast Reinforcement Learning for Energy-Efficient Wireless Communication

We consider the problem of energy-efficient point-to-point transmission of delay-sensitive data (e.g., multimedia data) over a fading channel. We propose a rigorous and unified framework for simultaneously utilizing both physical-layer and system-level techniques to minimize energy consumption, under delay constraints, in the presence of stochastic and unknown traffic and channel conditions. We formulate the problem as a Markov decision process and solve it online using reinforcement learning. The advantages of the proposed online method are that i) it does not require a priori knowledge of the traffic arrival and channel statistics to determine the jointly optimal physical-layer and system-level power management strategies; ii) it exploits partial information about the system so that less information needs to be learned than when using conventional reinforcement learning algorithms; and iii) it obviates the need for action exploration, which severely limits the adaptation speed and run-time performance of conventional reinforcement learning algorithms.

Mihaela van der Schaar | Nicholas Mastronarde | M. Schaar | Nicholas Mastronarde

[1] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[2] Vikram Krishnamurthy,et al. Monotonicity of Constrained Optimal Transmission Policies in Correlated Fading Channels With ARQ , 2010, IEEE Transactions on Signal Processing.

[3] Luca Benini,et al. Policy optimization for dynamic power management , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[4] Mihaela van der Schaar,et al. Transmitting Important Bits and Sailing High Radio Waves: A Decentralized Cross-Layer Approach to Cooperative Video Transmission , 2011, IEEE Journal on Selected Areas in Communications.

[5] E. Altman. Constrained Markov Decision Processes , 1999 .

[6] Christian Poellabauer,et al. Energy-aware traffic shaping for wireless real-time applications , 2004, Proceedings. RTAS 2004. 10th IEEE Real-Time and Embedded Technology and Applications Symposium, 2004..

[7] Tsuhan Chen,et al. Hierarchical Modeling of Variable Bit Rate Video Sources , 2001 .

[8] J.E. Mazo,et al. Digital communications , 1985, Proceedings of the IEEE.

[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11] Abhijeet Bhorkar,et al. An on-line learning algorithm for energy efficient delay constrained scheduling over a fading channel , 2008, IEEE Journal on Selected Areas in Communications.

[12] Georgios B. Giannakis,et al. Queuing with adaptive modulation and coding over wireless links: cross-Layer analysis and design , 2005, IEEE Transactions on Wireless Communications.

[13] Luca Benini,et al. Dynamic power management for nonstationary service requests , 1999, Design, Automation and Test in Europe Conference and Exhibition, 1999. Proceedings (Cat. No. PR00078).

[14] Syed Aon Mujtaba,et al. On the code-diversity performance of bit-interleaved coded OFDM in frequency-selective fading channels , 2003, 2003 IEEE 58th Vehicular Technology Conference. VTC 2003-Fall (IEEE Cat. No.03CH37484).

[15] Vivek S. Borkar,et al. A Stable Online Algorithm for Energy-Efficient Multiuser Scheduling , 2010, IEEE Transactions on Mobile Computing.

[16] Mihaela van der Schaar,et al. Structure-Aware Stochastic Control for Transmission Scheduling , 2010, IEEE Transactions on Vehicular Technology.

[17] Vivek S. Borkar,et al. An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..

[18] Dimitri P. Bertsekas,et al. Data Networks , 1986 .

[19] Warren B. Powell,et al. “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[20] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..

[21] Klara Nahrstedt,et al. QoS Support in Multimedia Wireless Environments , 2006 .

[22] Curt Schurgers. Energy-aware Wireless Communications , 2002 .

[23] Ashutosh Sabharwal,et al. Delay-bounded packet scheduling of bursty traffic over wireless channels , 2004, IEEE Transactions on Information Theory.

[24] Andrea J. Goldsmith,et al. Adaptive coded modulation for fading channels , 1998, IEEE Trans. Commun..

[25] Radu Marculescu,et al. Hierarchical adaptive dynamic power management , 2004 .

[26] Mihaela van der Schaar,et al. Online Reinforcement Learning for Dynamic Multimedia Systems , 2010, IEEE Transactions on Image Processing.

[27] Luca Benini,et al. Dynamic Power Management for Nonstationary Service Requests , 2002, IEEE Trans. Computers.

[28] Paramvir Bahl,et al. Wake on wireless: an event driven energy saving strategy for battery operated devices , 2002, MobiCom '02.

[29] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[30] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[31] Vikram Krishnamurthy,et al. MIMO Transmission Control in Fading Channels—A Constrained Markov Decision Process Formulation With Monotone Randomized Policies , 2007, IEEE Transactions on Signal Processing.