A New QoS Provisioning Method for Adaptive Multimedia in Wireless Networks

Future wireless networks are designed to support adaptive multimedia by controlling individual ongoing flows to increase or decrease their bandwidths in response to changes in traffic load. There is growing interest in quality-of-service (QoS) provisioning under this adaptive multimedia framework, in which a bandwidth adaptation algorithm needs to be used in conjunction with the call admission control algorithm. This paper presents a novel method for QoS provisioning via average reward reinforcement learning in conjunction with stochastic approximation, which can maximize the network revenue subject to several predetermined QoS constraints. Unlike other model-based algorithms (e.g., linear programming), our scheme does not require explicit state transition probabilities, and therefore, the assumptions behind the underlying system model are more realistic than those in previous schemes. In addition, when we consider the status of neighboring cells, the proposed scheme can dynamically adapt to changes in traffic condition. Moreover, the algorithm can control the bandwidth adaptation frequency effectively by accounting for the cost of bandwidth switching in the model. The effectiveness of the proposed approach is demonstrated using simulation results in adaptive multimedia wireless networks.

[1]  Yanghee Choi,et al.  Near optimal bandwidth adaptation algorithm for adaptive multimedia services in wireless/mobile networks , 1999, Gateway to 21st Century Communications Village. VTC 1999-Fall. IEEE VTS 50th Vehicular Technology Conference (Cat. No.99CH36324).

[2]  Bin Wang,et al.  Bandwidth degradation QoS provisioning for adaptive multimedia in wireless/mobile networks , 2002, Comput. Commun..

[3]  Stephen S. Rappaport,et al.  Traffic model and performance analysis for cellular mobile radio telephone systems with prioritized and nonprioritized handoff procedures , 1986, IEEE Transactions on Vehicular Technology.

[4]  F. Beutler,et al.  Time-average optimal constrained semi-Markov decision processes , 1986, Advances in Applied Probability.

[5]  Theodore S. Rappaport,et al.  Wireless communications - principles and practice , 1996 .

[6]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[7]  Yan Wang,et al.  Optimal Admission Control for Multi-Class of Wireless Adaptive Multimedia Services , 2001 .

[8]  Yiwei Thomas Hou,et al.  Scalable video coding and transport over broadband wireless networks , 2001, Proc. IEEE.

[9]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[10]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[11]  Sajal K. Das,et al.  A Prioritized Real-Time Wireless Call Degradation Framework for Optimal Call Mix Selection , 2002, Mob. Networks Appl..

[12]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[13]  Itu-T Video coding for low bitrate communication , 1996 .

[14]  Victor C. M. Leung,et al.  Efficient QoS Provisioning for Adaptive Multimedia in Mobile Communication Networks by Reinforcement Learning , 2004, First International Conference on Broadband Networks.

[15]  Timothy X. Brown,et al.  Adaptive call admission control under quality of service constraints: a reinforcement learning solution , 2000, IEEE Journal on Selected Areas in Communications.

[16]  Stephen S. Rappaport,et al.  Traffic Model and Performance Analysis for Cellular Mobile Radio Telephone Systems with Prioritized and Nonprioritized Handoff Procedures - Version 2a , 2000 .

[17]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18]  B. R. Badrinath,et al.  QoS provisioning for adaptive services with degradation in cellular network , 2003, 2003 IEEE Wireless Communications and Networking, 2003. WCNC 2003..

[19]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[20]  S. Haykin,et al.  A Q-learning-based dynamic channel assignment technique for mobile communication systems , 1999 .

[21]  Bo Li,et al.  A dynamic call admission policy with precision QoS guarantee using stochastic control for mobile wireless networks , 2002, TNET.

[22]  John N. Tsitsiklis,et al.  Call admission control and routing in integrated services networks using neuro-dynamic programming , 2000, IEEE Journal on Selected Areas in Communications.

[23]  B. R. Badrinath,et al.  Rate adaptation schemes in networks with mobile hosts , 1998, MobiCom '98.

[24]  Chung-Ju Chang,et al.  Q-learning-based multirate transmission control scheme for RRM in multimedia WCDMA systems , 2004, IEEE Transactions on Vehicular Technology.

[25]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[26]  John Moody,et al.  Learning rate schedules for faster stochastic gradient search , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[27]  F. Beutler,et al.  Optimal policies for controlled markov chains with a constraint , 1985 .

[28]  Vivek S. Borkar,et al.  Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..

[29]  Yanghee Choi,et al.  QoS Provisioning in Wireless/Mobile Multimedia Networks Using an Adaptive Framework , 2003, Wirel. Networks.

[30]  Kang G. Shin,et al.  Analysis of combined adaptive bandwidth allocation and admission control in wireless networks , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[31]  Djamal Zeghlache,et al.  A distributed reinforcement learning approach to maximize resource utilization and control handover dropping in multimedia wireless networks , 2002, The 13th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications.

[32]  Yi-Bing Lin,et al.  The sub-rating channel assignment strategy for PCS hand-offs , 1996 .

[33]  K. Rijkse,et al.  H.263: video coding for low-bit-rate communication , 1996, IEEE Commun. Mag..

[34]  Abhijit Gosavi,et al.  An algorithm for solving semi-markov decision problems using reinforcement learning: convergence analysis and numerical results , 1999 .

[35]  Avideh Zakhor,et al.  A common framework for rate and distortion based scaling of highly scalable compressed video , 1996, IEEE Trans. Circuits Syst. Video Technol..

[36]  S. Mahadevan,et al.  Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .

[37]  Tapas K. Das,et al.  A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking , 2002 .

[38]  Abhijit Gosavi,et al.  Reinforcement learning for long-run average cost , 2004, Eur. J. Oper. Res..