A conceptual framework for externally-influenced agents: an assisted reinforcement learning review

A long-term goal of reinforcement learning agents is to be able to perform tasks in complex real-world scenarios. The use of external information is one way of scaling agents to more complex problems. However, there is a general lack of collaboration or interoperability between different approaches using external information. In this work, we propose a conceptual framework and taxonomy for assisted reinforcement learning, aimed at fostering such collaboration by classifying and comparing various methods that use external information in the learning process. The proposed taxonomy details the relationship between the external information source and the learner agent, highlighting the process of information decomposition, structure, retention, and how it can be used to influence agent learning. As well as reviewing state-of-the-art methods, we identify current streams of reinforcement learning that use external information in order to improve the agent's performance and its decision-making process. These include heuristic reinforcement learning, interactive reinforcement learning, learning from demonstration, transfer learning, and learning from multiple sources, among others. These streams of reinforcement learning operate with the shared objective of scaffolding the learner agent. Lastly, we discuss further possibilities for future work in the field of assisted reinforcement learning systems.

[1]  Ioannis P. Vlahavas,et al.  Reinforcement learning agents providing advice in complex video games , 2014, Connect. Sci..

[2]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[3]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[4]  Felipe Leno da Silva,et al.  A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems , 2019, J. Artif. Intell. Res..

[5]  Mostafa Ghobaei-Arani,et al.  Joint computation offloading and resource provisioning for edge‐cloud computing environment: A machine learning‐based approach , 2020, Softw. Pract. Exp..

[6]  G. Zayaraz,et al.  A Brief Survey on Concept Drift , 2015 .

[7]  Stefan Wermter,et al.  Training Agents With Interactive Reinforcement Learning and Contextual Affordances , 2016, IEEE Transactions on Cognitive and Developmental Systems.

[8]  Gerhard Weiss,et al.  Reinforcement Learning Transfer Using a Sparse Coded Inter-task Mapping , 2011, EUMAS.

[9]  W.D. Smart,et al.  What does shaping mean for computational reinforcement learning? , 2008, 2008 7th IEEE International Conference on Development and Learning.

[10]  Alessandra Sciutti,et al.  Learning from Learners: Adapting Reinforcement Learning Agents to be Competitive in a Card Game , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Patrick M. Pilarski,et al.  Between Instruction and Reward: Human-Prompted Switching , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[13]  Peter Vamplew,et al.  Human Engagement Providing Evaluative and Informative Advice for Interactive Reinforcement Learning , 2020, ArXiv.

[14]  Sonia Chernova,et al.  Effect of human guidance and state space size on Interactive Reinforcement Learning , 2011, 2011 RO-MAN.

[15]  Ashwin Ram,et al.  Transfer Learning in Real-Time Strategy Games Using Hybrid CBR/RL , 2007, IJCAI.

[16]  Mostafa Ghobaei-Arani,et al.  A learning‐based approach for virtual machine placement in cloud data centers , 2018, Int. J. Commun. Syst..

[17]  Sam Devlin,et al.  Overcoming erroneous domain knowledge in plan-based reward shaping , 2013, AAMAS.

[18]  Ofra Amir,et al.  Interactive Teaching Strategies for Agent Training , 2016, IJCAI.

[19]  K. R. Dixon,et al.  Incorporating Prior Knowledge and Previously Learned Information into Reinforcement Learning Agents , 2000 .

[20]  Raúl Santos-Rodríguez,et al.  Online Feature Selection for Activity Recognition using Reinforcement Learning with Multiple Feedback , 2019, ArXiv.

[21]  András György,et al.  Learning from Delayed Outcomes with Intermediate Observations , 2018, ArXiv.

[22]  Antonio Bandera,et al.  A Survey of Vision-Based Architectures for Robot Learning by Imitation , 2012, Int. J. Humanoid Robotics.

[23]  Yang Gao,et al.  Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.

[24]  Bruno J. T. Fernandes,et al.  A Robust Approach for Continuous Interactive Actor-Critic Algorithms , 2021, IEEE Access.

[25]  Christian R. Shelton,et al.  Balancing Multiple Sources of Reward in Reinforcement Learning , 2000, NIPS.

[26]  Peter Vamplew,et al.  Explainable robotic systems: understanding goal-driven actions in a reinforcement learning scenario , 2020, Neural Computing and Applications.

[27]  Ioannis P. Vlahavas,et al.  Learning to Teach Reinforcement Learning Agents , 2017, Mach. Learn. Knowl. Extr..

[28]  Stefan Wermter,et al.  Improving reinforcement learning with interactive feedback and affordances , 2014, 4th International Conference on Development and Learning and on Epigenetic Robotics.

[29]  Peter Stone,et al.  Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study , 2006, RoboCup.

[30]  Eric Eaton,et al.  Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment , 2015, AAAI.

[31]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[32]  Neda Navidi,et al.  Human AI interaction loop training: New approach for interactive reinforcement learning , 2020, ArXiv.

[33]  Masashi Sugiyama,et al.  Active deep Q-learning with demonstration , 2018, Machine Learning.

[34]  Stefan Wermter,et al.  The Hybrid Integration of Perceptual Symbol Systems and Interactive Reinforcement Learning , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.

[35]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  András György,et al.  Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems , 2018, ICML.

[37]  Peter Stone,et al.  Source Task Creation for Curriculum Learning , 2016, AAMAS.

[38]  Gabriel Dulac-Arnold,et al.  Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[39]  Peter Stone,et al.  Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[40]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[41]  Francisco Cruz,et al.  Reinforcement learning using continuous states and interactive feedback , 2019, APPIS '19.

[42]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[43]  Stefan Wermter,et al.  Multi-modal Feedback for Affordance-driven Interactive Reinforcement Learning , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[44]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[45]  Sonia Chernova,et al.  Reinforcement Learning from Demonstration through Shaping , 2015, IJCAI.

[46]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[47]  Andrea Lockerd Thomaz,et al.  Asymmetric Interpretations of Positive and Negative Human Feedback for a Social Learning Agent , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[48]  Shie Mannor,et al.  Bayesian Reinforcement Learning , 2012, Reinforcement Learning.

[49]  Sonia Chernova,et al.  Learning from Demonstration for Shaping through Inverse Reinforcement Learning , 2016, AAMAS.

[50]  Stefan Wermter,et al.  Agent-advising approaches in an interactive reinforcement learning scenario , 2017, 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[51]  Peter Vamplew,et al.  Memory-Based Explainable Reinforcement Learning , 2019, Australasian Conference on Artificial Intelligence.

[52]  Peter Stone,et al.  Cobot in LambdaMOO: A Social Statistics Agent , 2000, AAAI/IAAI.

[53]  Stefan Schaal,et al.  Learning from Demonstration , 1996, NIPS.

[54]  Luís Nunes,et al.  Exchanging Advice and Learning to Trust , 2003, CIA.

[55]  Stefan Wermter,et al.  Improving interactive reinforcement learning: What makes a good teacher? , 2018, Connect. Sci..

[56]  Robert H. Deng,et al.  Privacy-Preserving Reinforcement Learning Design for Patient-Centric Dynamic Treatment Regimes , 2019, IEEE Transactions on Emerging Topics in Computing.

[57]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[58]  Volkan Cevher,et al.  Interactive Teaching Algorithms for Inverse Reinforcement Learning , 2019, IJCAI.

[59]  Scott Sanner,et al.  Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach , 2018, NeurIPS.

[60]  Yusen Zhan,et al.  Efficiently detecting switches against non-stationary opponents , 2017, Autonomous Agents and Multi-Agent Systems.

[61]  Peter Vamplew,et al.  Persistent Rule-based Interactive Reinforcement Learning , 2021, Neural Computing and Applications.

[62]  Hiroaki Kitano,et al.  RoboCup: A Challenge Problem for AI , 1997, AI Mag..

[63]  Matthew E. Taylor Assisting Transfer-Enabled Machine Learning Algorithms: Leveraging Human Knowledge for Curriculum Design , 2009, AAAI Spring Symposium: Agents that Learn from Human Teachers.

[64]  Pierre-Yves Oudeyer,et al.  Robot learning simultaneously a task and how to interpret human instructions , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[65]  Sam Devlin,et al.  Dynamic potential-based reward shaping , 2012, AAMAS.

[66]  Dongbin Zhao,et al.  StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.

[67]  Y. Niv Reinforcement learning in the brain , 2009 .

[68]  Mohammad Masdari,et al.  A Survey on the Computation Offloading Approaches in Mobile Edge/Cloud Computing Environment: A Stochastic-based Perspective , 2020, Journal of Grid Computing.

[69]  Matthew E. Taylor,et al.  Curriculum Design for Machine Learners in Sequential Decision Tasks , 2017, IEEE Transactions on Emerging Topics in Computational Intelligence.

[70]  Erik Talvitie,et al.  An Experts Algorithm for Transfer Learning , 2007, IJCAI.

[71]  Stefan Wermter,et al.  Accelerating Deep Continuous Reinforcement Learning through Task Simplification , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[72]  Sonia Chernova,et al.  Integrating reinforcement learning with human demonstrations of varying ability , 2011, AAMAS.

[73]  Stefan Wermter,et al.  Interactive reinforcement learning through speech guidance in a domestic scenario , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[74]  Stefan Wermter,et al.  Curriculum goal masking for continuous deep reinforcement learning , 2018, 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[75]  Francisco Cruz,et al.  Unmanned Aerial Vehicle Control Through Domain-based Automatic Speech Recognition , 2020, Comput..

[76]  Alessandra Sciutti,et al.  Moody Learners - Explaining Competitive Behaviour of Reinforcement Learning Agents , 2020, 2020 Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).

[77]  Shimon Whiteson,et al.  Inverse Reinforcement Learning from Failure , 2016, AAMAS.

[78]  Matthew E. Taylor,et al.  Useful Policy Invariant Shaping from Arbitrary Advice , 2020, ArXiv.

[79]  Stefan Wermter,et al.  Multi-modal integration of dynamic audiovisual patterns for an interactive reinforcement learning scenario , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[80]  Thommen George Karimpanal,et al.  Learning Transferable Domain Priors for Safe Exploration in Reinforcement Learning , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[81]  Manuela M. Veloso,et al.  Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[82]  Brett Browning,et al.  Automatic weight learning for multiple data sources when learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[83]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[84]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[85]  Peter Vamplew,et al.  Explainable robotic systems: Interpreting outcome-focused actions in a reinforcement learning scenario , 2020, ArXiv.

[86]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[87]  Felipe Leno da Silva,et al.  Object-Oriented Curriculum Generation for Reinforcement Learning , 2018, AAMAS.

[88]  Andrea Lockerd Thomaz,et al.  Exploration from Demonstration for Interactive Reinforcement Learning , 2016, AAMAS.

[89]  Aude Billard,et al.  Transfer in inverse reinforcement learning for multiple strategies , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[90]  Anna Saranti,et al.  Towards multi-modal causability with Graph Neural Networks enabling information fusion for explainable AI , 2021, Inf. Fusion.

[91]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[92]  Bo He,et al.  Human-Centered Reinforcement Learning: A Survey , 2019, IEEE Transactions on Human-Machine Systems.

[93]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[94]  Garrison W. Cottrell,et al.  Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.

[95]  Pablo Hernandez-Leal,et al.  Uncertainty-Aware Action Advising for Deep Reinforcement Learning Agents , 2020, AAAI.

[96]  Eduardo F. Morales,et al.  Dynamic Reward Shaping: Training a Robot by Voice , 2010, IBERAMIA.

[97]  Felipe Leno da Silva,et al.  Simultaneously Learning and Advising in Multiagent Reinforcement Learning , 2017, AAMAS.

[98]  Matthew Hausknecht and Peter Stone,et al.  Half Field Offense: An Environment for Multiagent Learning and Ad Hoc Teamwork , 2016 .

[99]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[100]  Felipe Leno Da Silva Integrating Agent Advice and Previous Task Solutions in Multiagent Reinforcement Learning , 2019, AAMAS.

[101]  Keisuke Nakamura,et al.  A Review on Interactive Reinforcement Learning From Human Social Feedback , 2020, IEEE Access.

[102]  Reid G. Simmons,et al.  Complexity Analysis of Real-Time Reinforcement Learning , 1993, AAAI.

[103]  Ioannis Vlahavas,et al.  Reinforcement Learning and Automated Planning: A Survey , 2008 .

[104]  WhitesonShimon,et al.  A survey of multi-objective sequential decision-making , 2013 .

[105]  Bikramjit Banerjee,et al.  General Game Learning Using Knowledge Transfer , 2007, IJCAI.

[106]  Yusen Zhan,et al.  Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer , 2016, IJCAI.

[107]  Peter Vamplew,et al.  A Demonstration of Issues with Value-Based Multiobjective Reinforcement Learning Under Stochastic State Transitions , 2020, ArXiv.

[108]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[109]  Andreas Holzinger,et al.  Measuring the Quality of Explanations: The System Causability Scale (SCS) , 2020, KI - Künstliche Intelligenz.

[110]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[111]  Bruno J. T. Fernandes,et al.  Human feedback in continuous actor-critic reinforcement learning , 2019, ESANN.

[112]  Peter Vamplew,et al.  An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users , 2021, Biomimetics.

[113]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[114]  Sheng-Tzong Cheng,et al.  A framework of an agent planning with reinforcement learning for e-pet , 2013, 2013 1st International Conference on Orange Technologies (ICOT).

[115]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[116]  Nikhil Churamani,et al.  iCub: Learning Emotion Expressions using Human Reward , 2020, ArXiv.

[117]  Matthew E. Taylor,et al.  Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence , 2014, AAAI.

[118]  Wenbing Huang,et al.  Reinforcement Learning from Imperfect Demonstrations under Soft Expert Guidance , 2019, AAAI.

[119]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[120]  Reinaldo A. C. Bianchi,et al.  Transferring knowledge as heuristics in reinforcement learning: A case-based approach , 2015, Artif. Intell..

[121]  Maya Cakmak,et al.  Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[122]  Peter Stone,et al.  Autonomous Task Sequencing for Customized Curriculum Design in Reinforcement Learning , 2017, IJCAI.

[123]  Matthew E. Taylor,et al.  Multi-objectivization and ensembles of shapings in reinforcement learning , 2017, Neurocomputing.

[124]  Peter Stone,et al.  Agents teaching agents: a survey on inter-agent transfer learning , 2019, Autonomous Agents and Multi-Agent Systems.

[125]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[126]  Bradley C. Love,et al.  How Humans Teach Agents - A New Experimental Perspective , 2012, Int. J. Soc. Robotics.

[127]  Sam Devlin,et al.  Expressing Arbitrary Reward Functions as Potential-Based Advice , 2015, AAAI.

[128]  Andrea Lockerd Thomaz,et al.  Active Attention-Modified Policy Shaping: Socially Interactive Agents Track , 2019, AAMAS.

[129]  Roland Siegwart,et al.  Comparing Task Simplifications to Learn Closed-Loop Object Picking Using Deep Reinforcement Learning , 2018, IEEE Robotics and Automation Letters.

[130]  Luca Maria Gambardella,et al.  Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem , 1995, ICML.

[131]  Mohan Sridharan,et al.  What Can I Not Do? Towards an Architecture for Reasoning about and Learning Affordances , 2017, ICAPS.

[132]  Eyke Hüllermeier,et al.  Preference-based reinforcement learning: a formal framework and a policy iteration algorithm , 2012, Mach. Learn..

[133]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[134]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[135]  Sergio M. M. Fernandes,et al.  A Robust Approach for Continuous Interactive Reinforcement Learning , 2020, HAI.

[136]  Peter Stone,et al.  Agents teaching agents: a survey on inter-agent transfer learning , 2020 .

[137]  Reinaldo A. C. Bianchi,et al.  Heuristic Reinforcement Learning Applied to RoboCup Simulation Agents , 2008, RoboCup.

[138]  Cynthia Breazeal,et al.  Training a Robot via Human Feedback: A Case Study , 2013, ICSR.

[139]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[140]  Hui Xiong,et al.  A Comprehensive Survey on Transfer Learning , 2019, Proceedings of the IEEE.

[141]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[142]  Andrea Lockerd Thomaz,et al.  Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains , 2014, Artif. Intell..

[143]  Stefan Wermter,et al.  Learning contextual affordances with an associative neural architecture , 2016, ESANN.

[144]  Pierre-Yves Oudeyer,et al.  Robotic clicker training , 2002, Robotics Auton. Syst..

[145]  Marco Wiering,et al.  Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[146]  B F Skinner,et al.  The shaping of phylogenic behavior. , 1975, Journal of the experimental analysis of behavior.

[147]  Yaneer Bar-Yam,et al.  Segregation dynamics with reinforcement learning and agent based modeling , 2019, Scientific Reports.

[148]  Shimon Whiteson,et al.  Transfer via inter-task mappings in policy search reinforcement learning , 2007, AAMAS '07.

[149]  Andreas Holzinger,et al.  Interactive machine learning for health informatics: when do we need the human-in-the-loop? , 2016, Brain Informatics.

[150]  Brett Browning,et al.  Learning by demonstration with critique from a human teacher , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[151]  Eric Horvitz,et al.  Combining human and machine intelligence in large-scale crowdsourcing , 2012, AAMAS.

[152]  Takeo Igarashi,et al.  A Survey on Interactive Reinforcement Learning: Design Principles and Open Challenges , 2020, Conference on Designing Interactive Systems.

[153]  Gerald Tesauro,et al.  Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[154]  Cynthia Breazeal,et al.  Real-Time Interactive Reinforcement Learning for Robots , 2005 .

[155]  Peter Stone,et al.  Value Functions for RL-Based Behavior Transfer: A Comparative Study , 2005, AAAI.

[156]  Richard Dazeley,et al.  Deep Reinforcement Learning with Interactive Feedback in a Human-Robot Environment , 2020, ArXiv.

[157]  Peter Stone,et al.  Autonomous transfer for reinforcement learning , 2008, AAMAS.

[158]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[159]  Kening Zhu,et al.  Emergency-Response Locomotion of Hexapod Robot with Heuristic Reinforcement Learning Using Q-Learning , 2019, ICR.

[160]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[161]  Bing Liu,et al.  Lifelong machine learning: a paradigm for continuous learning , 2017, Frontiers of Computer Science.

[162]  Sam Devlin,et al.  Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.

[163]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[164]  Stefan Wermter,et al.  Action Selection Methods in a Robotic Reinforcement Learning Scenario , 2018, 2018 IEEE Latin American Conference on Computational Intelligence (LA-CCI).

[165]  J. Karlsson,et al.  Learning to Play Games from Multiple Imperfect Teachers , 2014 .

[166]  Carme Torras,et al.  A robot learning from demonstration framework to perform force-based manipulation tasks , 2013, Intelligent Service Robotics.

[167]  Pierpaolo Pontrandolfo,et al.  Inventory management in supply chains: a reinforcement learning approach , 2002 .

[168]  Peter Stone,et al.  Reinforcement learning from human reward: Discounting in episodic tasks , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[169]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2018, Neural Networks.

[170]  K. Subramanian,et al.  Learning Options through Human Interaction , 2011 .

[171]  Peter Vamplew,et al.  Explainable reinforcement learning for broad-XAI: a conceptual framework and survey , 2021, Neural Computing and Applications.

[172]  Peter Vamplew,et al.  Levels of explainable artificial intelligence for human-aligned conversational explanations , 2021, Artif. Intell..

[173]  Jiming Liu,et al.  Partially Observable Reinforcement Learning for Sustainable Active Surveillance , 2018, KSEM.

[174]  Matthew E. Taylor,et al.  Teaching on a budget: agents advising agents in reinforcement learning , 2013, AAMAS.