Optimal Elevator Group Control via Deep Asynchronous Actor–Critic Learning

In this article, a new deep reinforcement learning (RL) method, called asynchronous advantage actor–critic (A3C) method, is developed to solve the optimal control problem of elevator group control systems (EGCSs). The main contribution of this article is that the optimal control law of EGCSs is designed via a new deep RL method, such that the elevator system sends passengers to the desired destination floors as soon as possible. Deep convolutional and recurrent neural networks, which can update themselves during applications, are designed to dispatch elevators. Then, the structure of the A3C method is developed, and the training phase for the learning optimal law is discussed. Finally, simulation results illustrate that the developed method effectively reduces the average waiting time in a complex building environment. Comparisons with traditional algorithms further verify the effectiveness of the developed method.

[1]  Haibo He,et al.  Adaptive Dynamic Programming for Robust Regulation and Its Application to Power Systems , 2018, IEEE Transactions on Industrial Electronics.

[2]  Hyung Lee-Kwang,et al.  Design and implementation of a fuzzy elevator group control system , 1998, IEEE Trans. Syst. Man Cybern. Part A.

[3]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[4]  Avimanyu Sahoo,et al.  Approximate Optimal Distributed Control of Nonlinear Interconnected Systems Using Event-Triggered Nonzero-Sum Games , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[6]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[7]  Warren E. Dixon,et al.  Approximate Dynamic Programming: Combining Regional and Local State Following Approximations , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Qinglai Wei,et al.  Discrete-Time Stable Generalized Self-Learning Optimal Control With Approximation Errors , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Pablo Cortes,et al.  A Survey of Elevator Group Control Systems for Vertical Transportation: A Look at Recent Literature , 2015, IEEE Control Systems.

[10]  Anton Jansson,et al.  Elevator Control Using Reinforcement Learning to Select Strategy , 2015 .

[11]  Xiaofeng Liao,et al.  Event-triggered differentially private average consensus for multi-agent network , 2019, IEEE/CAA Journal of Automatica Sinica.

[12]  S. Tsuji,et al.  Application of the expert system to elevator group-supervisory control , 1989, [1989] Proceedings. The Fifth Conference on Artificial Intelligence Applications.

[13]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[14]  Wooi Ping Hew,et al.  An Elevator Group Control System With a Self-Tuning Fuzzy Logic Group Controller , 2010, IEEE Transactions on Industrial Electronics.

[15]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[16]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[17]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[18]  Jing Peng,et al.  Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .

[19]  Pablo Cortés,et al.  Genetic algorithm for controllers in elevator groups: analysis and simulation during lunchpeak traffic , 2004, Appl. Soft Comput..

[20]  Nobuyasu Osato,et al.  Elevator group control system using multiagent system , 2003, Systems and Computers in Japan.

[21]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[22]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[23]  Andrew G. Barto,et al.  Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[24]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[25]  Derong Liu,et al.  Adaptive Constrained Optimal Control Design for Data-Based Nonlinear Discrete-Time Systems With Critic-Only Structure , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[27]  Lutfi Al-Sharif,et al.  Elevator Traffic Handbook: Theory and Practice , 2003 .

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[30]  Haibo He,et al.  Data-Driven Finite-Horizon Approximate Optimal Control for Discrete-Time Nonlinear Systems Using Iterative HDP Approach , 2018, IEEE Transactions on Cybernetics.

[31]  Youxian Sun,et al.  Robust ADP Design for Continuous-Time Nonlinear Systems With Output Constraints , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[33]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[34]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[35]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[36]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[37]  Changyin Sun,et al.  Learning to Navigate Through Complex Dynamic Environment With Modular Deep Reinforcement Learning , 2018, IEEE Transactions on Games.

[38]  Thomas Bartz-Beielstein,et al.  Optimal Elevator Group Control by Evolution Strategies , 2003, GECCO.

[39]  Zhong-Ping Jiang,et al.  Learning-Based Adaptive Optimal Tracking Control of Strict-Feedback Nonlinear Systems , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[40]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[41]  Tao Feng,et al.  Distributed Optimal Consensus Control for Nonlinear Multiagent System With Unknown Dynamic , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Chuanqiang Lian,et al.  Learning-Based Predictive Control for Discrete-Time Nonlinear Systems With Stochastic Disturbances , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[43]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[44]  Ali Heydari,et al.  Stability Analysis of Optimal Adaptive Control Under Value Iteration Using a Stabilizing Initial Policy , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[45]  Derong Liu,et al.  Manifold Regularized Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Pablo Cortés,et al.  Dynamic Fuzzy Logic Elevator Group Control System With Relative Waiting Time Consideration , 2014, IEEE Transactions on Industrial Electronics.

[47]  Yixin Yin,et al.  Hamiltonian-Driven Adaptive Dynamic Programming for Continuous Nonlinear Dynamical Systems , 2017, IEEE Transactions on Neural Networks and Learning Systems.