Efficient QoS Provisioning for Adaptive Multimedia in Mobile Communication Networks by Reinforcement Learning

The scarcity and large fluctuations of link bandwidth in wireless networks have motivated the development of adaptive multimedia services in mobile communication networks, where it is possible to increase or decrease the bandwidth of individual ongoing flows. This paper studies the issues of quality of service (QoS) provisioning in such systems. In particular, call admission control and bandwidth adaptation are formulated as a constrained Markov decision problem. The rapid growth in the number of states and the difficulty in estimating state transition probabilities in practical systems make it very difficult to employ classical methods to find the optimal policy. We present a novel approach that uses a form of reinforcement learning known as Q-learning to solve QoS provisioning for wireless adaptive multimedia. Q-learning does not require the explicit state transition model to solve the Markov decision problem; therefore more general and realistic assumptions can be applied to the underlying system model for this approach than in previous schemes. Moreover, the proposed scheme can efficiently handle the large state space and action set of the wireless adaptive multimedia QoS provisioning problem. Handoff dropping probability and average allocated bandwidth are considered as QoS constraints in our model and can be guaranteed simultaneously. Simulation results demonstrate the effectiveness of the proposed scheme in adaptive multimedia mobile communication networks.

[1]  Vincent Wai Sum Wong,et al.  Reinforcement-learning-based call admission control and bandwidth adaptation in mobile multimedia networks , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[2]  Timothy X. Brown,et al.  Adaptive call admission control under quality of service constraints: a reinforcement learning solution , 2000, IEEE Journal on Selected Areas in Communications.

[3]  Leonidas Georgiadis,et al.  Channel sharing by rate-adaptive streaming applications , 2004, Perform. Evaluation.

[4]  Csaba Szepesvári,et al.  Multi-criteria Reinforcement Learning , 1998, ICML.

[5]  Sajal K. Das,et al.  A Prioritized Real-Time Wireless Call Degradation Framework for Optimal Call Mix Selection , 2002, Mob. Networks Appl..

[6]  Bo Li,et al.  A dynamic call admission policy with precision QoS guarantee using stochastic control for mobile wireless networks , 2002, TNET.

[7]  Yiwei Thomas Hou,et al.  Scalable video coding and transport over broadband wireless networks , 2001, Proc. IEEE.

[8]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[9]  Yanghee Choi,et al.  Near optimal bandwidth adaptation algorithm for adaptive multimedia services in wireless/mobile networks , 1999, Gateway to 21st Century Communications Village. VTC 1999-Fall. IEEE VTS 50th Vehicular Technology Conference (Cat. No.99CH36324).

[10]  F. Beutler,et al.  Optimal policies for controlled markov chains with a constraint , 1985 .

[11]  B. R. Badrinath,et al.  Rate adaptation schemes in networks with mobile hosts , 1998, MobiCom '98.

[12]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[13]  S. Haykin,et al.  A Q-learning-based dynamic channel assignment technique for mobile communication systems , 1999 .

[14]  John N. Tsitsiklis,et al.  Call admission control and routing in integrated services networks using neuro-dynamic programming , 2000, IEEE Journal on Selected Areas in Communications.

[15]  Itu-T Video coding for low bitrate communication , 1996 .

[16]  Yanghee Choi,et al.  QoS Provisioning in Wireless/Mobile Multimedia Networks Using an Adaptive Framework , 2003, Wirel. Networks.

[17]  Sajal K. Das,et al.  LeZi-Update: An Information-Theoretic Framework for Personal Mobility Tracking in PCS Networks , 2002, Wirel. Networks.

[18]  Kang G. Shin,et al.  Analysis of combined adaptive bandwidth allocation and admission control in wireless networks , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[19]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[20]  Stephen S. Rappaport,et al.  Traffic model and performance analysis for cellular mobile radio telephone systems with prioritized and nonprioritized handoff procedures , 1986, IEEE Transactions on Vehicular Technology.

[21]  Avideh Zakhor,et al.  A common framework for rate and distortion based scaling of highly scalable compressed video , 1996, IEEE Trans. Circuits Syst. Video Technol..

[22]  F. Beutler,et al.  Time-average optimal constrained semi-Markov decision processes , 1986, Advances in Applied Probability.

[23]  E. Altman Constrained Markov Decision Processes , 1999 .