Adaptive proportional fair parameterization based LTE scheduling using continuous actor-critic reinforcement learning

Maintaining a desired trade-off performance between system throughput maximization and user fairness satisfaction constitutes a problem that is still far from being solved. In LTE systems, different tradeoff levels can be obtained by using a proper parameterization of the Generalized Proportional Fair (GPF) scheduling rule. Our approach is able to find the best parameterization policy that maximizes the system throughput under different fairness constraints imposed by the scheduler state. The proposed method adapts and refines the policy at each Transmission Time Interval (TTI) by using the Multi-Layer Perceptron Neural Network (MLPNN) as a non-linear function approximation between the continuous scheduler state and the optimal GPF parameter(s). The MLPNN function generalization is trained based on Continuous Actor-Critic Learning Automata Reinforcement Learning (CACLA RL). The double GPF parameterization optimization problem is addressed by using CACLA RL with two continuous actions (CACLA-2). Five reinforcement learning algorithms as simple parameterization techniques are compared against the novel technology. Simulation results indicate that CACLA-2 performs much better than any of other candidates that adjust only one scheduling parameter such as CACLA-1. CACLA-2 outperforms CACLA-1 by reducing the percentage of TTIs when the system is considered unfair. Being able to attenuate the fluctuations of the obtained policy, CACLA-2 achieves enhanced throughput gain when severe changes in the scheduling environment occur, maintaining in the same time the fairness optimality condition.

[1]  Markus Rupp,et al.  Throughput Maximizing Multiuser Scheduling with Adjustable Fairness , 2011, 2011 IEEE International Conference on Communications (ICC).

[2]  Sijing Zhang,et al.  Multi Objective Resource Scheduling in LTE Networks Using Reinforcement Learning , 2012, Int. J. Distributed Syst. Technol..

[3]  Sijing Zhang,et al.  A novel dynamic Q-learning-based scheduler technique for LTE-advanced technologies using neural networks , 2012, 37th Annual IEEE Conference on Local Computer Networks.

[4]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[5]  Marco Wiering QV(lambda)-learning: A New On-policy Reinforcement Learning Algrithm , 2005 .

[6]  Marco Wiering,et al.  The QV family compared to other reinforcement learning algorithms , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[7]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[8]  Gabriel-Miro Muntean,et al.  A utility-based priority scheduling scheme for multimedia delivery over LTE networks , 2013, 2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB).

[9]  Hajo Bakker,et al.  Adaptive fairness control for a proportional fair LTE scheduler , 2010, 21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications.

[10]  Sijing Zhang,et al.  Reinforcement learning based radio resource scheduling in LTE-advanced , 2011, The 17th International Conference on Automation and Computing.

[11]  Alexander Golitschek Edler von Elbwart,et al.  Fairness and throughput analysis for generalized proportional fair frequency scheduling in OFDMA , 2005, 2005 IEEE 61st Vehicular Technology Conference.

[12]  Sijing Zhang,et al.  Scheduling policies based on dynamic throughput and fairness tradeoff control in LTE-A networks , 2014, 39th Annual IEEE Conference on Local Computer Networks.