Reinforcement learning in robotic applications: a comprehensive survey

In recent trends, artificial intelligence (AI) is used for the creation of complex automated control systems. Still, researchers are trying to make a completely autonomous system that resembles human beings. Researchers working in AI think that there is a strong connection present between the learning pattern of human and AI. They have analyzed that machine learning (ML) algorithms can effectively make self-learning systems. ML algorithms are a sub-field of AI in which reinforcement learning (RL) is the only available methodology that resembles the learning mechanism of the human brain. Therefore, RL must take a key role in the creation of autonomous robotic systems. In recent years, RL has been applied on many platforms of the robotic systems like an air-based, under-water, land-based, etc., and got a lot of success in solving complex tasks. In this paper, a brief overview of the application of reinforcement algorithms in robotic science is presented. This survey offered a comprehensive review based on segments as (1) development of RL (2) types of RL algorithm like; Actor-Critic, DeepRL, multi-agent RL and Human-centered algorithm (3) various applications of RL in robotics based on their usage platforms such as land-based, water-based and air-based, (4) RL algorithms/mechanism used in robotic applications. Finally, an open discussion is provided that potentially raises a range of future research directions in robotics. The objective of this survey is to present a guidance point for future research in a more meaningful direction.

[1]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[2]  Yang Gao,et al.  Multiagent Reinforcement Learning With Sparse Interactions by Negotiation and Knowledge Transfer , 2015, IEEE Transactions on Cybernetics.

[3]  Uwe Aickelin,et al.  Idiotypic Immune Networks in Mobile-Robot Control , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Spyros G. Tzafestas,et al.  Fuzzy reinforcement learning control for compliance tasks of robotic manipulators , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[5]  Hong Cheng,et al.  Learning Physical Human–Robot Interaction With Coupled Cooperative Primitives for a Lower Exoskeleton , 2019, IEEE Transactions on Automation Science and Engineering.

[6]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[7]  Chris A. Czarnecki,et al.  Connectionist Learning in Behaviour-Based Mobile Robots: A Survey , 2004, Artificial Intelligence Review.

[8]  Trung Dung Ngo,et al.  Toward Socially Aware Robot Navigation in Dynamic and Crowded Environments: A Proactive Social Motion Model , 2017, IEEE Transactions on Automation Science and Engineering.

[9]  Marcus Gallagher,et al.  Reinforcement Learning in First Person Shooter Games , 2011, IEEE Transactions on Computational Intelligence and AI in Games.

[10]  Mariano De Paula,et al.  Incremental Q-learning strategy for adaptive PID control of mobile robots , 2017, Expert Syst. Appl..

[11]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[12]  Stefan Wermter,et al.  Training Agents With Interactive Reinforcement Learning and Contextual Affordances , 2016, IEEE Transactions on Cognitive and Developmental Systems.

[13]  Sean P. Meyn,et al.  An analysis of reinforcement learning with function approximation , 2008, ICML '08.

[14]  Yong Duan,et al.  A multi-agent reinforcement learning approach to robot soccer , 2012, Artificial Intelligence Review.

[15]  C. Ribeiro,et al.  Reinforcement Learning Agents , 2002, Artificial Intelligence Review.

[16]  Stefan Schaal,et al.  Model-Free Reinforcement Learning of Impedance Control in Stochastic Environments , 2012, IEEE Transactions on Autonomous Mental Development.

[17]  Reda Alhajj,et al.  Multiagent reinforcement learning using function approximation , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[18]  Hussein A. Abbass,et al.  Hierarchical Deep Reinforcement Learning for Continuous Action Control , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[20]  Minjie Zhang,et al.  Multiagent Learning of Coordination in Loosely Coupled Multiagent Systems , 2015, IEEE Transactions on Cybernetics.

[21]  Pietro Falco,et al.  On Policy Learning Robust to Irreversible Events: An Application to Robotic In-Hand Manipulation , 2018, IEEE Robotics and Automation Letters.

[22]  Changyin Sun,et al.  Target Search Control of AUV in Underwater Environment With Deep Reinforcement Learning , 2019, IEEE Access.

[23]  N. H. C. Yung,et al.  A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[24]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[25]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[26]  Yan Peng,et al.  Adaptive Impedance Control of Human–Robot Cooperation Using Reinforcement Learning , 2017, IEEE Transactions on Industrial Electronics.

[27]  Michael L. Littman,et al.  Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.

[28]  Li Wang,et al.  Barrier-Certified Adaptive Reinforcement Learning With Applications to Brushbot Navigation , 2018, IEEE Transactions on Robotics.

[29]  Roland Siegwart,et al.  Comparing Task Simplifications to Learn Closed-Loop Object Picking Using Deep Reinforcement Learning , 2018, IEEE Robotics and Automation Letters.

[30]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[31]  Frank L. Lewis,et al.  Optimal and Autonomous Control Using Reinforcement Learning: A Survey , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Toshio Fukuda,et al.  Reinforcement Learning of Manipulation and Grasping Using Dynamical Movement Primitives for a Humanoidlike Mobile Manipulator , 2017, IEEE/ASME Transactions on Mechatronics.

[33]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[34]  Saeid Nahavandi,et al.  Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications , 2018, IEEE Transactions on Cybernetics.

[35]  Eduardo F. Morales,et al.  Dynamic Reward Shaping: Training a Robot by Voice , 2010, IBERAMIA.

[36]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[37]  Abhijit Gosavi,et al.  Reinforcement Learning: A Tutorial Survey and Recent Advances , 2009, INFORMS J. Comput..

[38]  Huaguang Zhang,et al.  Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems With Unknown Dynamics Using Reinforcement Learning Method , 2017, IEEE Transactions on Industrial Electronics.

[39]  Petros Maragos,et al.  Learn to Adapt to Human Walking: A Model-Based Reinforcement Learning Approach for a Robotic Assistant Rollator , 2019, IEEE Robotics and Automation Letters.

[40]  Sergey Levine,et al.  Low-Level Control of a Quadrotor With Deep Model-Based Reinforcement Learning , 2019, IEEE Robotics and Automation Letters.

[41]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[42]  Vincent D Costa,et al.  Motivational neural circuits underlying reinforcement learning , 2017, Nature Neuroscience.

[43]  Yurong Xu,et al.  Episodic task learning in Markov decision processes , 2011, Artificial Intelligence Review.

[44]  Jonathan P. How,et al.  Real-World Reinforcement Learning via Multifidelity Simulators , 2015, IEEE Transactions on Robotics.

[45]  Sen Wang,et al.  Adaptive low-level control of autonomous underwater vehicles using deep reinforcement learning , 2018, Robotics Auton. Syst..

[46]  Ian Thomas,et al.  Multitask Policy Adversarial Learning for Human-Level Control With Large State Spaces , 2019, IEEE Transactions on Industrial Informatics.

[47]  Hyung Suck Cho,et al.  A sensor-based navigation for a mobile robot using fuzzy logic and reinforcement learning , 1995, IEEE Trans. Syst. Man Cybern..

[48]  C. Watkins Learning from delayed rewards , 1989 .

[49]  Martin A. Riedmiller,et al.  Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[50]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[51]  Risto Miikkulainen,et al.  Designing neural networks through neuroevolution , 2019, Nat. Mach. Intell..

[52]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[53]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Chao Xu,et al.  Hierarchical Decision and Control for Continuous Multitarget Problem: Policy Evaluation With Action Delay , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[55]  Ying Wang,et al.  A Hybrid Visual Servo Controller for Robust Grasping by Wheeled Mobile Robots , 2010, IEEE/ASME Transactions on Mechatronics.

[56]  Bogdan Trasnea,et al.  NeuroTrajectory: A Neuroevolutionary Approach to Local State Trajectory Learning for Autonomous Vehicles , 2019, IEEE Robotics and Automation Letters.

[57]  Lisa Meeden,et al.  An incremental approach to developing intelligent neural network controllers for robots , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[58]  Magnus Egerstedt,et al.  Low-Dimensional Learning for Complex Robots , 2015, IEEE Transactions on Automation Science and Engineering.

[59]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[60]  Frank L. Lewis,et al.  Optimal Synchronization of Heterogeneous Nonlinear Systems With Unknown Dynamics , 2018, IEEE Transactions on Automatic Control.

[61]  Saeid Nahavandi,et al.  System Design Perspective for Human-Level Agents Using Deep Reinforcement Learning: A Survey , 2017, IEEE Access.

[62]  Weihua Zhuang,et al.  User-Centric View of Unmanned Aerial Vehicle Transmission Against Smart Attacks , 2018, IEEE Transactions on Vehicular Technology.

[63]  Hitoshi Iba,et al.  Adaptation technique for integrating genetic programming and reinforcement learning for real robots , 2005, IEEE Transactions on Evolutionary Computation.

[64]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[65]  Benjamin Schrauwen,et al.  On Learning Navigation Behaviors for Small Mobile Robots With Reservoir Computing Architectures , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[66]  Kenneth O. Stanley,et al.  Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[67]  Jan Peters,et al.  Learning to Serve: An Experimental Study for a New Learning From Demonstrations Framework , 2018, IEEE Robotics and Automation Letters.

[68]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[69]  P. Ridao,et al.  COLA2: A Control Architecture for AUVs , 2012, IEEE Journal of Oceanic Engineering.

[70]  Siqi Liu,et al.  Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey , 2019, ArXiv.

[71]  Derong Liu,et al.  Event-Triggered Optimal Neuro-Controller Design With Reinforcement Learning for Unknown Nonlinear Systems , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[72]  Yang Liu,et al.  A new Q-learning algorithm based on the metropolis criterion , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[73]  Lingyang Song,et al.  Reinforcement Learning for Decentralized Trajectory Design in Cellular UAV Networks With Sense-and-Send Protocol , 2018, IEEE Internet of Things Journal.

[74]  Guangming Xie,et al.  Coordination of Multiple Robotic Fish With Applications to Underwater Robot Competition , 2016, IEEE Transactions on Industrial Electronics.

[75]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[76]  Ville Kyrki,et al.  Transferring Generalizable Motor Primitives From Simulation to Real World , 2019, IEEE Robotics and Automation Letters.

[77]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[78]  Tomás Martínez-Marín,et al.  Integration of Cell-Mapping and Reinforcement-Learning Techniques for Motion Planning of Car-Like Robots , 2009, IEEE Transactions on Instrumentation and Measurement.

[79]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[80]  Etienne Perot,et al.  Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[81]  Wouter Caarls,et al.  Parallel Online Temporal Difference Learning for Motor Control , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[82]  G. Baldassarre,et al.  Autonomous Reinforcement Learning of Multiple Interrelated Tasks , 2019, 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[83]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[84]  Eduardo Zalama Casanova,et al.  Adaptive behavior navigation of a mobile robot , 2002, IEEE Trans. Syst. Man Cybern. Part A.

[85]  R. Sutton Introduction: The Challenge of Reinforcement Learning , 1992 .

[86]  Carme Torras,et al.  Dimensionality Reduction for Dynamic Movement Primitives and Application to Bimanual Manipulation of Clothes , 2018, IEEE Transactions on Robotics.

[87]  Yu Lasheng,et al.  Research on task decomposition and state abstraction in reinforcement learning , 2012, Artificial Intelligence Review.

[88]  Meng Joo Er,et al.  Obstacle avoidance of a mobile robot using hybrid learning approach , 2005, IEEE Transactions on Industrial Electronics.

[89]  Kao-Shing Hwang,et al.  Decoupled Visual Servoing With Fuzzy Q-Learning , 2018, IEEE Transactions on Industrial Informatics.

[90]  Kao-Shing Hwang,et al.  Learning to Adjust and Refine Gait Patterns for a Biped Robot , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[91]  Simon X. Yang,et al.  Hierarchical Approximate Policy Iteration With Binary-Tree State Space Decomposition , 2011, IEEE Transactions on Neural Networks.

[92]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[93]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[94]  Francesco Braghin,et al.  Iterative Learning Procedure With Reinforcement for High-Accuracy Force Tracking in Robotized Tasks , 2018, IEEE Transactions on Industrial Informatics.

[95]  Chia-Feng Juang,et al.  Reinforcement Ant Optimized Fuzzy Controller for Mobile-Robot Wall-Following Control , 2009, IEEE Transactions on Industrial Electronics.

[96]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[97]  Derong Liu,et al.  Output Tracking Control Based on Adaptive Dynamic Programming With Multistep Policy Evaluation , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[98]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[99]  Weidong Zhang,et al.  Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels , 2018, Neurocomputing.

[100]  Venkatesh K. Subramanian,et al.  Robust Hybrid Visual Servoing Using Reinforcement Learning and Finite-Time Adaptive FOSMC , 2019, IEEE Systems Journal.

[101]  Naixue Xiong,et al.  UAV Autonomous Target Search Based on Deep Reinforcement Learning in Complex Disaster Scene , 2019, IEEE Access.

[102]  Liwei Zhang,et al.  Combining Model-Based $Q$ -Learning With Structural Knowledge Transfer for Robot Skill Learning , 2019, IEEE Transactions on Cognitive and Developmental Systems.

[103]  David Pfau,et al.  Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[104]  Dongbing Gu,et al.  Integration of Coordination Architecture and Behavior Fuzzy Learning in Quadruped Walking Robots , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[105]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[106]  Faruk Polat,et al.  Learning Intelligent Behavior in a Non-stationary and Partially Observable Environment , 2002, Artificial Intelligence Review.

[107]  Y. Matsuoka,et al.  Reinforcement Learning and Synergistic Control of the ACT Hand , 2013, IEEE/ASME Transactions on Mechatronics.

[108]  N. H. C. Yung,et al.  An intelligent mobile vehicle navigator based on fuzzy logic and reinforcement learning , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[109]  Daxue Liu,et al.  Self-Learning Cruise Control Using Kernel-Based Least Squares Policy Iteration , 2014, IEEE Transactions on Control Systems Technology.

[110]  Marc Carreras,et al.  Two-step gradient-based reinforcement learning for underwater robotics behavior learning , 2013, Robotics Auton. Syst..

[111]  TaeChoong Chung,et al.  Importance sampling policy gradient algorithms in reproducing kernel Hilbert space , 2019, Artificial Intelligence Review.

[112]  Yasuhisa Hasegawa,et al.  Self-scaling reinforcement learning for fuzzy logic controller-applications to motion control of two-link brachiation robot , 1999, IEEE Trans. Ind. Electron..

[113]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[114]  Jun Tan,et al.  Parameterized Batch Reinforcement Learning for Longitudinal Control of Autonomous Land Vehicles , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[115]  Rajneesh Sharma,et al.  A Markov Game-Adaptive Fuzzy Controller for Robot Manipulators , 2008, IEEE Transactions on Fuzzy Systems.

[116]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[117]  Karl Tuyls,et al.  Integrating State Representation Learning Into Deep Reinforcement Learning , 2018, IEEE Robotics and Automation Letters.

[118]  Eduardo Bejar,et al.  A Preview Neuro-Fuzzy Controller Based on Deep Reinforcement Learning for Backing Up a Truck-Trailer Vehicle , 2019, 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE).

[119]  Shiji Song,et al.  Plume Tracing via Model-Free Reinforcement Learning Method , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[120]  Majid Nili Ahmadabadi,et al.  Interaction of Culture-based Learning and Cooperative Co-evolution and its Application to Automatic Behavior-based System Design , 2010, IEEE Transactions on Evolutionary Computation.

[121]  Justin A. Boyan,et al.  Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[122]  Sidney N. Givigi,et al.  A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment , 2017, IEEE Transactions on Cybernetics.

[123]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[124]  Roland Siegwart,et al.  Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[125]  Tzuu-Hseng S. Li,et al.  Walking Motion Generation, Synthesis, and Control for Biped Robot by Using PGRL, LPI, and Fuzzy Logic , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[126]  Dacheng Tao,et al.  Balance control of a biped robot on a rotating platform based on efficient reinforcement learning , 2019, IEEE/CAA Journal of Automatica Sinica.

[127]  Jan Peters,et al.  Learning table tennis with a Mixture of Motor Primitives , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[128]  Derui Ding,et al.  Path Planning via an Improved DQN-Based Learning Policy , 2019, IEEE Access.

[129]  Helder Araujo,et al.  Learning to Navigate Endoscopic Capsule Robots , 2019, IEEE Robotics and Automation Letters.

[130]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[131]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[132]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[133]  Mariangela Manti,et al.  Multiobjective Optimization for Stiffness and Position Control in a Soft Robot Arm Module , 2018, IEEE Robotics and Automation Letters.

[134]  Chi-Kwong Li,et al.  An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control , 2005, IEEE Transactions on Intelligent Transportation Systems.

[135]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[136]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[137]  Christian Igel,et al.  Evolution Strategies for Direct Policy Search , 2008, PPSN.

[138]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[139]  Kazushi Ikeda,et al.  A new criterion using information gain for action selection strategy in reinforcement learning , 2004, IEEE Transactions on Neural Networks.

[140]  Vijaykumar Gullapalli,et al.  A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[141]  Bruno Castro da Silva,et al.  Learning Parameterized Skills , 2012, ICML.

[142]  Alberto Viseras,et al.  DeepIG: Multi-Robot Information Gathering With Deep Reinforcement Learning , 2019, IEEE Robotics and Automation Letters.

[143]  Goldie Nejat,et al.  A Learning-Based Semi-Autonomous Controller for Robotic Exploration of Unknown Disaster Scenes While Searching for Victims , 2014, IEEE Transactions on Cybernetics.

[144]  B. Averbeck,et al.  Reinforcement learning in artificial and biological systems , 2019, Nature Machine Intelligence.

[145]  Kao-Shing Hwang,et al.  A Simple Scheme for Formation Control Based on Weighted Behavior Learning , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[146]  Erfu Yang,et al.  Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey , 2004 .

[147]  Francesco Maurelli,et al.  Reinforcement learning in a behaviour-based control architecture for marine archaeology , 2015, OCEANS 2015 - Genova.

[148]  Fiery Cushman,et al.  Teaching with Rewards and Punishments: Reinforcement or Communication? , 2015, CogSci.

[149]  Ivan Koryakovskiy,et al.  Model-Plant Mismatch Compensation Using Reinforcement Learning , 2018, IEEE Robotics and Automation Letters.

[150]  Rui Zhang,et al.  Wireless communications with unmanned aerial vehicles: opportunities and challenges , 2016, IEEE Communications Magazine.

[151]  Andrea Bonarini,et al.  An approach to the design of reinforcement functions in real world, agent-based applications , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[152]  Michael G. Madden,et al.  Transfer of Experience Between Reinforcement Learning Environments with Progressive Difficulty , 2004, Artificial Intelligence Review.

[153]  Cairo L. Nascimento,et al.  Autonomous Construction of Multiple Structures Using Learning Automata: Description and Experimental Validation , 2015, IEEE Systems Journal.

[154]  David Pfau,et al.  Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[155]  Robert Babuska,et al.  Experience Replay for Real-Time Reinforcement Learning Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[156]  Yuan Shen,et al.  Autonomous Navigation of UAVs in Large-Scale Complex Environments: A Deep Reinforcement Learning Approach , 2019, IEEE Transactions on Vehicular Technology.

[157]  Frank L. Lewis,et al.  Optimized Assistive Human–Robot Interaction Using Reinforcement Learning , 2016, IEEE Transactions on Cybernetics.

[158]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[159]  Derong Liu,et al.  Adaptive $Q$ -Learning for Data-Based Optimal Output Regulation With Experience Replay , 2018, IEEE Transactions on Cybernetics.

[160]  Sungchul Kang,et al.  Impedance Learning for Robotic Contact Tasks Using Natural Actor-Critic Algorithm , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[161]  W. Art Chaovalitwongse,et al.  Machine Learning Algorithms in Bipedal Robot Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[162]  F. Richard Yu,et al.  Intelligent Trajectory Design in UAV-Aided Communications With Reinforcement Learning , 2019, IEEE Transactions on Vehicular Technology.

[163]  Ian H. Witten,et al.  An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[164]  Klaus Obermayer,et al.  Autonomous Learning of State Representations for Control: An Emerging Field Aims to Autonomously Learn State Representations for Reinforcement Learning Agents from Their Real-World Sensor Observations , 2015, KI - Künstliche Intelligenz.

[165]  Ken Chen,et al.  Gait Synthesis and Sensory Control of Stair Climbing for a Humanoid Robot , 2008, IEEE Transactions on Industrial Electronics.

[166]  Lydia Tapia,et al.  Continuous action reinforcement learning for control-affine systems with unknown dynamics , 2014, IEEE/CAA Journal of Automatica Sinica.

[167]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[168]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[169]  C. L. Philip Chen,et al.  A survey of human-centered intelligent robots: issues and challenges , 2017, IEEE/CAA Journal of Automatica Sinica.

[170]  Dimitri P. Bertsekas,et al.  Feature-based aggregation and deep reinforcement learning: a survey and some new implementations , 2018, IEEE/CAA Journal of Automatica Sinica.

[171]  Bo He,et al.  Human-Centered Reinforcement Learning: A Survey , 2019, IEEE Transactions on Human-Machine Systems.

[172]  Mehdi Khamassi,et al.  Robot Fast Adaptation to Changes in Human Engagement During Simulated Dynamic Social Interaction With Active Exploration in Parameterized Reinforcement Learning , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[173]  Changyin Sun,et al.  Learning to Navigate Through Complex Dynamic Environment With Modular Deep Reinforcement Learning , 2018, IEEE Transactions on Games.

[174]  Weihua Sheng,et al.  Multirobot Cooperative Learning for Predator Avoidance , 2015, IEEE Transactions on Control Systems Technology.

[175]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[176]  Daoyi Dong,et al.  Robust Quantum-Inspired Reinforcement Learning for Robot Navigation , 2012, IEEE/ASME Transactions on Mechatronics.

[177]  Wolfram Burgard,et al.  VR-Goggles for Robots: Real-to-Sim Domain Adaptation for Visual Control , 2018, IEEE Robotics and Automation Letters.

[178]  Matteo Leonetti,et al.  An Optimization Framework for Task Sequencing in Curriculum Learning , 2019, 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[179]  Kao-Shing Hwang,et al.  A Modular Agent Architecture for an Autonomous Robot , 2009, IEEE Transactions on Instrumentation and Measurement.

[180]  Liam Paull,et al.  Deep Active Localization , 2019, IEEE Robotics and Automation Letters.

[181]  Derong Liu,et al.  Event-Triggered Optimal Control With Performance Guarantees Using Adaptive Dynamic Programming , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[182]  Nikos Vlassis,et al.  A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence I Mobk077-fm Synthesis Lectures on Artificial Intelligence and Machine Learning a Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence a Concise Introduction to Multiagent Systems and D , 2007 .