A Novel Hierarchical Soft Actor-Critic Algorithm for Multi-Logistics Robots Task Allocation

In intelligent unmanned warehouse goods-to-man systems, the allocation of tasks has an important influence on the efficiency because of the dynamic performance of AGV robots and orders. The paper presents a hierarchical Soft Actor-Critic algorithm to solve the dynamic scheduling problem of orders picking. The method proposed is based on the classic Soft Actor-Critic and hierarchical reinforcement learning algorithm. In this paper, the model is trained at different time scales by introducing sub-goals, with the top-level learning a policy and the bottom level learning a policy to achieve the sub-goals. The actor of the controller aims to maximize expected intrinsic reward while also maximizing entropy. That is, to succeed at the sub-goals while moving as randomly as possible. Finally, experimental results for simulation experiments in different scenes show that the method can make multi-logistics AGV robots work together and improves the reward in sparse environments about 2.61 times compared to the SAC algorithm.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Ata Jahangir Moshayedi,et al.  AGV (automated guided vehicle) robot: Mission and obstacles in design and performance , 2019 .

[3]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[4]  Tianmiao Wang,et al.  Current Researches and Future Development Trend of Intelligent Robot: A Review , 2018, Int. J. Autom. Comput..

[5]  Szil'ard Aradi,et al.  Survey of Deep Reinforcement Learning for Motion Planning of Autonomous Vehicles , 2020, IEEE Transactions on Intelligent Transportation Systems.

[6]  Xiaolei Zhang Intelligent Storage System Based on “Internet+” Logistics , 2019 .

[7]  Dongyuan Ge,et al.  Key Technologies of Warehousing Robot for Intelligent logistics , 2019 .

[8]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[9]  David C. Noelle,et al.  Learning Representations in Model-Free Hierarchical Reinforcement Learning , 2018, AAAI.

[10]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[11]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[12]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[13]  Jamal Shahrabi,et al.  A reinforcement learning approach to parameter estimation in dynamic job shop scheduling , 2017, Comput. Ind. Eng..

[14]  Wei Xu,et al.  End-to-end learning of semantic role labeling using recurrent neural networks , 2015, ACL.

[15]  Wu Deng,et al.  An Improved Ant Colony Optimization Algorithm Based on Hybrid Strategies for Scheduling Problem , 2019, IEEE Access.

[16]  Radac,et al.  Data-Driven Model-Free Tracking Reinforcement Learning Control with VRFT-based Adaptive Actor-Critic , 2019, Applied Sciences.

[17]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[18]  Hyunsoo Lee,et al.  Adaptive Human–Machine Evaluation Framework Using Stochastic Gradient Descent-Based Reinforcement Learning for Dynamic Competing Network , 2020 .

[19]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[20]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[21]  Dongbin Zhao,et al.  A Survey of Deep Reinforcement Learning in Video Games , 2019, ArXiv.

[22]  Shuo Wang,et al.  Real–Sim–Real Transfer for Real-World Robot Control Policy Learning with Deep Reinforcement Learning , 2020, Applied Sciences.

[23]  Cong Hu,et al.  Integrating a Path Planner and an Adaptive Motion Controller for Navigation in Dynamic Environments , 2019, Applied Sciences.

[24]  Xiumin Chu,et al.  An Improved A-Star Algorithm Considering Water Current, Traffic Separation and Berthing for Vessel Path Planning , 2019, Applied Sciences.

[25]  Dianwei Qian,et al.  Multi-Robot Path Planning Method Using Reinforcement Learning , 2019, Applied Sciences.

[26]  Fei Xue,et al.  Task Allocation of Intelligent Warehouse Picking System based on Multi-robot Coalition , 2019, KSII Trans. Internet Inf. Syst..

[27]  Lukas Schäfer,et al.  Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning , 2020, NeurIPS.

[28]  Xiaoqing Han,et al.  Review on the research and practice of deep learning and reinforcement learning in smart grids , 2018, CSEE Journal of Power and Energy Systems.

[29]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[30]  Jianming Zhu,et al.  Operation Mechanisms for Intelligent Logistics System: A Blockchain Perspective , 2019, IEEE Access.

[31]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[32]  Saeid Nahavandi,et al.  Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications , 2018, IEEE Transactions on Cybernetics.

[33]  Jatinder N. D. Gupta,et al.  An improved cuckoo search algorithm for scheduling jobs on identical parallel machines , 2018, Comput. Ind. Eng..

[34]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[35]  Joseph Y.-T. Leung,et al.  Ant colony optimization algorithm for scheduling jobs with fuzzy processing time on parallel batch machines with different capacities , 2019, Appl. Soft Comput..

[36]  Hang Liu,et al.  Multi-Objective Workflow Scheduling With Deep-Q-Network-Based Multi-Agent Reinforcement Learning , 2019, IEEE Access.

[37]  Jung-Su Kim,et al.  Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience Replay , 2020, Applied Sciences.

[38]  Xiangpei Hu,et al.  A method integrating simulation and reinforcement learning for operation scheduling in container terminals , 2012 .

[39]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[40]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[41]  Mohamed Elhoseny,et al.  Extended Genetic Algorithm for solving open-shop scheduling problem , 2019, Soft Comput..