论文信息 - Leveraging human knowledge in tabular reinforcement learning: a study of human subjects

Leveraging human knowledge in tabular reinforcement learning: a study of human subjects

Reinforcement Learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort and expertise on the human designer's part. To date, human factors are generally not considered in the development and evaluation of possible RL approaches. In this article, we set out to investigate how different methods for injecting human knowledge are applied, in practice, by human designers of varying levels of knowledge and skill. We perform the first empirical evaluation of several methods, including a newly proposed method named SASS which is based on the notion of similarities in the agent's state-action space. Through this human study, consisting of 51 human participants, we shed new light on the human factors that play a key role in RL. We find that the classical reward shaping technique seems to be the most natural method for most designers, both expert and non-expert, to speed up RL. However, we further find that our proposed method SASS can be effectively and efficiently combined with reward shaping, and provides a beneficial alternative to using only a single speedup method with minimal human designer effort overhead.

[1] Sarit Kraus,et al. Predicting Human Decision-Making: From Prediction to Action , 2018, Predicting Human Decision-Making.

[2] Matthew Stewart,et al. IEEE Transactions on Cybernetics , 2015, IEEE Transactions on Cybernetics.

[3] David Sarne,et al. Intelligent Advice Provisioning for Repeated Interaction , 2016, AAAI.

[4] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[5] Brussels,et al. Volume 39 , 1998 .

[6] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[7] Sam Devlin,et al. Multi-agent, reward shaping for RoboCup KeepAway , 2011, AAMAS.

[8] Reda Alhajj,et al. Positive Impact of State Similarity on Reinforcement Learning Performance , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9] A. Landfield,et al. Personal Construct Psychology , 1980 .

[10] Sonia Chernova,et al. Learning from Demonstration for Shaping through Inverse Reinforcement Learning , 2016, AAMAS.

[11] Matthew E. Taylor,et al. Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence , 2014, AAAI.

[12] อนิรุธ สืบสิงห์,et al. Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[13] Amos Azaria,et al. Advice Provision for Energy Saving in Automobile Climate-Control System , 2015, AI Mag..

[14] Sarit Kraus,et al. Speeding up Tabular Reinforcement Learning Using State-Action Similarities , 2017, AAMAS.

[15] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[16] Noa Agmon,et al. Intelligent agent supporting human-multi-robot team collaboration , 2015, Artif. Intell..

[17] David L. Roberts,et al. A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans , 2016, AAMAS.

[18] Peter Stone,et al. Model-based function approximation in reinforcement learning , 2007, AAMAS '07.

[19] Carlos H. C. Ribeiro. Attentional Mechanisms as a Strategy for Generalization in the Q-Learning Algorithm , 1995 .

[20] James S. Albus,et al. Brains, behavior, and robotics , 1981 .

[21] Peter Stone,et al. TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.

[22] Graham Kendall,et al. Editorial: IEEE Transactions on Computational Intelligence and AI in Games , 2015, IEEE Trans. Comput. Intell. AI Games.

[23] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[24] Jerome S. Bruner,et al. Going Beyond the Information Given , 2006 .

[25] M. V. Rossum,et al. In Neural Computation , 2022 .

[26] Peter Stone,et al. Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[27] R. Lathe. Phd by thesis , 1988, Nature.

[28] Csaba Szepesvári,et al. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[29] Reinaldo A. C. Bianchi,et al. Heuristically-Accelerated Reinforcement Learning: A Comparative Analysis of Performance , 2013, TAROS.

[30] Julian Togelius,et al. The Mario AI Benchmark and Competitions , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[31] Jerome S. Bruner,et al. Contemporary approaches to cognition , 1958 .

[32] Sarit Kraus,et al. Leveraging human knowledge in tabular reinforcement learning: a study of human subjects , 2018, Knowl. Eng. Rev..

[33] Balaraman Ravindran,et al. On the hardness of finding symmetries in Markov decision processes , 2008, ICML '08.

[34] Michael L. Littman,et al. Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.

[35] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[36] Sarit Kraus,et al. Providing Arguments in Discussions on the Basis of the Prediction of Human Argumentative Behavior , 2016, ACM Trans. Interact. Intell. Syst..

[37] Sonia Chernova,et al. Reinforcement Learning from Demonstration through Shaping , 2015, IJCAI.

[38] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[39] S. Hart,et al. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research , 1988 .

[40] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[41] Peter Stone,et al. Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[42] Amos Azaria,et al. Adaptive Advice in Automobile Climate Control Systems , 2015, AAAI Workshop: AI for Transportation.

[43] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[44] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[45] M. Benda,et al. On Optimal Cooperation of Knowledge Sources , 1985 .

[46] Reinaldo A. C. Bianchi,et al. Heuristically-Accelerated Multiagent Reinforcement Learning , 2014, IEEE Transactions on Cybernetics.

[47] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.

[48] Claudia V. Goldman,et al. Online Prediction of Exponential Decay Time Series with Human-Agent Application , 2016, ECAI.

[49] Ana Paiva,et al. An Associative State-Space Metric for Learning in Factored MDPs , 2013, EPIA.

[50] Brian Tanner,et al. RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments , 2009, J. Mach. Learn. Res..

[51] Xiaodong Li,et al. Dynamic Choice of State Abstraction in Q-Learning , 2016, ECAI.

[52] Tucker R. Balch,et al. Symmetry in Markov Decision Processes and its Implications for Single Agent and Multiagent Learning , 2001, ICML.