An improved reinforcement learning algorithm based on knowledge transfer and applications in autonomous vehicles

Abstract Autonomous learning is a crucially important capability of intelligent robots. As one of the most fashionable machine learning techniques, the reinforcement learning (RL) enables an agent taking an optimized action by interacting with the environment so as to maximize some notion of cumulative reward. In this paper, an improved RL algorithm, named as the KT-HA-Q(λ) algorithm, is proposed by resorting to the knowledge transfer of source domain. First, a BP neural network and a liner sensor network are skillfully constructed to perform the knowledge transfer of source task for weight initialization in target task, and the knowledge transfer on actions of case base obtained by source domain, respectively. Then, the novel case base expansion and progressive forgetting criterion, which realize the balance between new experience via online learning and historical experience in the case base, are developed to enhance the learning efficiency and the learning rate. Furthermore, an improved heuristic function is proposed by replacing the action traditionally obtained via a selection strategy by the experience action. This function acts as a crucial role for both the best action selection and its Q value calculation. Finally, the proposed algorithm is utilized in the hill-climbing experiment of unmanned vehicles under a complex 3D scene by transferring the knowledge obtained in a 2D scene. The results of contrast experiments verified the advantages and effectiveness of the proposed algorithm.

[1]  Qing-Long Han,et al.  Finite-Time $H_{\infty}$ State Estimation for Discrete Time-Delayed Genetic Regulatory Networks Under Stochastic Communication Protocols , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[2]  Derui Ding,et al.  An improved GMS-PROSAC algorithm for image mismatch elimination , 2018 .

[3]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[4]  Reinaldo A. C. Bianchi,et al.  Using Cases as Heuristics in Reinforcement Learning: A Transfer Learning Application , 2011, IJCAI.

[5]  Frithjof Weber,et al.  The application of case based reasoning to decision support in new product development , 2003 .

[6]  Ah-Hwee Tan,et al.  Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning With Delayed Evaluative Feedback , 2008, IEEE Transactions on Neural Networks.

[7]  Reinaldo A. C. Bianchi,et al.  Transferring knowledge as heuristics in reinforcement learning: A case-based approach , 2015, Artif. Intell..

[8]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[9]  Jacek M. Zurada,et al.  Self-Organizing Neural Networks Integrating Domain Knowledge and Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Hector Geffner,et al.  Learning Generalized Policies from Planning Examples Using Concept Languages , 2004, Applied Intelligence.

[11]  Zhen Liu,et al.  Adaptive neural network tracking control-based reinforcement learning for wheeled mobile robots with skidding and slipping , 2017, Neurocomputing.

[12]  Reinaldo A. C. Bianchi,et al.  Reinforcement Learning with Case-Based Heuristics for RoboCup Soccer Keepaway , 2012, 2012 Brazilian Robotics Symposium and Latin American Robotics Symposium.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Qing-Long Han,et al.  A Survey on Model-Based Distributed Control and Filtering for Industrial Cyber-Physical Systems , 2019, IEEE Transactions on Industrial Informatics.

[15]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[16]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[17]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[18]  Zhan Li,et al.  Training a robust reinforcement learning controller for the uncertain system based on policy gradient method , 2018, Neurocomputing.

[19]  Kenji Ota,et al.  Performance Evaluation of Tile Coding in Reinforcement Learning , 2015, HAI.

[20]  Min Wu,et al.  State Estimation for Discrete Time-Delayed Genetic Regulatory Networks With Stochastic Noises Under the Round-Robin Protocols , 2018, IEEE Transactions on NanoBioscience.

[21]  Xuesong Wang,et al.  Multi-source transfer ELM-based Q learning , 2014, Neurocomputing.

[22]  Qing-Long Han,et al.  $\mathcal{H}_{\infty}$ Containment Control of Multiagent Systems Under Event-Triggered Communication Scheduling: The Finite-Horizon Case , 2020, IEEE Transactions on Cybernetics.

[23]  Mal-Rey Lee,et al.  An Exception Handling of Rule-Based Reasoning Using Case-Based Reasoning , 2002, J. Intell. Robotic Syst..

[24]  Yujing Hu,et al.  Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer , 2015, IEEE Transactions on Cybernetics.

[25]  Zhiming Zhang,et al.  Similarity Measures for Retrieval in Case-Based Reasoning Systems , 1998, Appl. Artif. Intell..

[26]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[27]  Shonali Krishnaswamy,et al.  A Retrieval Strategy for Case-Based Reasoning Using Similarity and Association Knowledge , 2014, IEEE Transactions on Cybernetics.

[28]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[29]  Peter Stone,et al.  Function Approximation via Tile Coding: Automating Parameter Choice , 2005, SARA.

[30]  Anna Helena Reali Costa,et al.  Stochastic Abstract Policies: Generalizing Knowledge to Improve Reinforcement Learning , 2015, IEEE Transactions on Cybernetics.

[31]  Chris Drummond,et al.  Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks , 2011, J. Artif. Intell. Res..

[32]  Richard S. Sutton,et al.  Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[33]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[34]  Yang Gao,et al.  Multiagent Reinforcement Learning With Sparse Interactions by Negotiation and Knowledge Transfer , 2015, IEEE Transactions on Cybernetics.

[35]  Xiaowei Wang,et al.  Reinforcement learning-based asymptotic cooperative tracking of a class multi-agent dynamic systems using neural networks , 2016, Neurocomputing.

[36]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[37]  Liwei Zhang,et al.  Combining Model-Based $Q$ -Learning With Structural Knowledge Transfer for Robot Skill Learning , 2019, IEEE Transactions on Cognitive and Developmental Systems.

[38]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[39]  Yanjun Wu,et al.  Deep reinforcement learning for extractive document summarization , 2018, Neurocomputing.

[40]  Alain Micaelli,et al.  Transfer of knowledge for a climbing Virtual Human: A reinforcement learning approach , 2009, 2009 IEEE International Conference on Robotics and Automation.