In LTE-A cellular networks there is a fundamental trade-off between the cell throughput and fairness levels for preselected users which are sharing the same amount of resources at one transmission time interval (TTI). The static parameterization of the Generalized Proportional Fair (GPF) scheduling rule is not able to maintain a satisfactory level of fairness at each TTI when a very dynamic radio environment is considered. The novelty of the current paper aims to find the optimal policy of GPF parameters in order to respect the fairness criterion. From sustainability reasons, the multi-layer perceptron neural network (MLPNN) is used to map at each TTI the continuous and multidimensional scheduler state into a desired GPF parameter. The MLPNN non-linear function is trained TTI-by-TTI based on the interaction between LTE scheduler and the proposed intelligent controller. The interaction is modeled by using the reinforcement learning (RL) principle in which the LTE scheduler behavior is modeled based on the Markov Decision Process (MDP) property. The continuous actor-critic learning automata (CACLA) RL algorithm is proposed to select at each TTI the continuous and optimal GPF parameter for a given MDP problem. The results indicate that CACLA enhances the convergence speed to the optimal fairness condition when compared with other existing methods by minimizing in the same time the number of TTIs when the scheduler is declared unfair.
[1]
Sijing Zhang,et al.
A novel dynamic Q-learning-based scheduler technique for LTE-advanced technologies using neural networks
,
2012,
37th Annual IEEE Conference on Local Computer Networks.
[2]
Markus Rupp,et al.
Throughput Maximizing Multiuser Scheduling with Adjustable Fairness
,
2011,
2011 IEEE International Conference on Communications (ICC).
[3]
Marco Wiering,et al.
Using continuous action spaces to solve discrete problems
,
2009,
2009 International Joint Conference on Neural Networks.
[4]
Peter Dayan,et al.
Q-learning
,
1992,
Machine Learning.
[5]
Tommi S. Jaakkola,et al.
Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
,
2000,
Machine Learning.
[6]
Marco Wiering,et al.
The QV family compared to other reinforcement learning algorithms
,
2009,
2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[7]
Hajo Bakker,et al.
Adaptive fairness control for a proportional fair LTE scheduler
,
2010,
21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications.