Reinforcement Learning for Call Admission Control and Routing under Quality of Service Constraints in Multimedia Networks

In this paper, we solve the call admission control and routing problem in multimedia networks via reinforcement learning (RL). The problem requires that network revenue be maximized while simultaneously meeting quality of service constraints that forbid entry into certain states and use of certain actions. The problem can be formulated as a constrained semi-Markov decision process. We show that RL provides a solution to this problem and is able to earn significantly higher revenues than alternative heuristics.

[1]  Timothy X. Brown,et al.  Estimating loss rates in an integrated services network by neural networks , 1998, IEEE GLOBECOM 1998 (Cat. NO. 98CH36250).

[2]  Dimitri P. Bertsekas,et al.  Data networks (2nd ed.) , 1992 .

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[5]  Pierre A. Humblet,et al.  Routing subject to quality of service constraints in integrated communication networks , 1995, IEEE Netw..

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  Timothy X. Brown,et al.  Statistical-classification-based admission control , 2001, SPIE ITCom.

[8]  Eugene A. Feinberg,et al.  Constrained Semi-Markov decision processes with average rewards , 1994, Math. Methods Oper. Res..

[9]  Timothy X. Brown,et al.  Adaptive call admission control under quality of service constraints: a reinforcement learning solution , 2000, IEEE Journal on Selected Areas in Communications.

[10]  Dimitri P. Bertsekas,et al.  Data Networks , 1986 .

[11]  Murad S. Taqqu,et al.  On the Self-Similar Nature of Ethernet Traffic , 1993, SIGCOMM.

[12]  Timothy X. Brown,et al.  Adaptive admission control and routing under quality of service constraints in broadband communications , 1999 .

[13]  Zbigniew Dziong,et al.  ATM Network Resource Management , 1997 .

[14]  Timothy X. Brown,et al.  Adaptive Statistical Multiplexing for Broadband Communication , 2002 .

[15]  E. Altman,et al.  Adaptive control of constrained Markov chains: Criteria and policies , 1991 .

[16]  Jie Wang,et al.  Robust dynamic admission control for unified cell and call QoS in statistical multiplexers , 1998, IEEE J. Sel. Areas Commun..

[17]  Timothy X. Brown,et al.  Optimizing Admission Control while Ensuring Quality of Service in Multimedia Networks via Reinforcement Learning , 1998, NIPS.

[18]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[19]  John N. Tsitsiklis,et al.  Reinforcement Learning for Call Admission Control and Routing in Integrated Service Networks , 1997, NIPS.

[20]  K. R. Krishnan Markov Decision Algorithms for Dynamic Routing , 1990 .

[21]  S. Haykin,et al.  A Q-learning-based dynamic channel assignment technique for mobile communication systems , 1999 .

[22]  John N. Tsitsiklis,et al.  Call admission control and routing in integrated services networks using neuro-dynamic programming , 2000, IEEE Journal on Selected Areas in Communications.

[23]  Jakob and Nordström Ernst Carlström Reinforcement learning for control of self-similar call traffic in broadband networks , 1999 .

[24]  Zbigniew Dziong,et al.  Call admission and routing in multi-service loss networks , 1994, IEEE Trans. Commun..

[25]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[26]  Csaba Szepesvári,et al.  Multi-criteria Reinforcement Learning , 1998, ICML.

[27]  Csaba Szepesvári,et al.  The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.

[28]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[29]  Norio Shiratori,et al.  QoS based routing algorithm in integrated services packet networks , 1997, Proceedings 1997 International Conference on Network Protocols.

[30]  F. Beutler,et al.  Time-average optimal constrained semi-Markov decision processes , 1986, Advances in Applied Probability.

[31]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[32]  John N. Tsitsiklis,et al.  A neuro-dynamic programming approach to admission control in ATM networks: the single link case , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33]  F. Beutler,et al.  Optimal policies for controlled markov chains with a constraint , 1985 .