Speeding up Tabular Reinforcement Learning Using State-Action Similarities
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[2] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[3] Csaba Szepesvári,et al. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.
[4] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[5] M. Benda,et al. On Optimal Cooperation of Knowledge Sources , 1985 .
[6] Sonia Chernova,et al. Learning from Demonstration for Shaping through Inverse Reinforcement Learning , 2016, AAMAS.
[7] Peter Stone,et al. Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.
[8] Matthew E. Taylor,et al. Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence , 2014, AAAI.
[9] Carlos H. C. Ribeiro. Attentional Mechanisms as a Strategy for Generalization in the Q-Learning Algorithm , 1995 .
[10] Amos Azaria,et al. Adaptive Advice in Automobile Climate Control Systems , 2015, AAAI Workshop: AI for Transportation.
[11] Stephen P. Brooks,et al. Markov Decision Processes. , 1989 .
[12] Reinaldo A. C. Bianchi,et al. Heuristically-Accelerated Reinforcement Learning: A Comparative Analysis of Performance , 2013, TAROS.
[13] Reinaldo A. C. Bianchi,et al. Transferring knowledge as heuristics in reinforcement learning: A case-based approach , 2015, Artif. Intell..
[14] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[15] Sonia Chernova,et al. Integrating reinforcement learning with human demonstrations of varying ability , 2011, AAMAS.
[16] Peter Stone,et al. Model-based function approximation in reinforcement learning , 2007, AAMAS '07.
[17] Claudia V. Goldman,et al. Online Prediction of Exponential Decay Time Series with Human-Agent Application , 2016, ECAI.
[18] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[19] Peter Stone,et al. TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.
[20] Sarit Kraus,et al. Providing Arguments in Discussions on the Basis of the Prediction of Human Argumentative Behavior , 2016, ACM Trans. Interact. Intell. Syst..
[21] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[22] Sonia Chernova,et al. Reinforcement Learning from Demonstration through Shaping , 2015, IJCAI.
[23] Jude W. Shavlik,et al. Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression , 2005, AAAI.
[24] Julian Togelius,et al. The Mario AI Benchmark and Competitions , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[25] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[26] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[27] Ariel Rosenfeld,et al. Automated Agents for Advice Provision , 2015, IJCAI.
[28] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[29] Reda Alhajj,et al. Positive Impact of State Similarity on Reinforcement Learning Performance , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[30] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[31] Ana Paiva,et al. An Associative State-Space Metric for Learning in Factored MDPs , 2013, EPIA.
[32] Matthew E. Taylor,et al. Metric learning for reinforcement learning agents , 2011, AAMAS.
[33] Xiaodong Li,et al. Dynamic Choice of State Abstraction in Q-Learning , 2016, ECAI.
[34] David Sarne,et al. Intelligent Advice Provisioning for Repeated Interaction , 2016, AAAI.
[35] Balaraman Ravindran,et al. On the hardness of finding symmetries in Markov decision processes , 2008, ICML '08.
[36] Michael L. Littman,et al. Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.
[37] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[38] Sam Devlin,et al. Expressing Arbitrary Reward Functions as Potential-Based Advice , 2015, AAAI.
[39] Noa Agmon,et al. Intelligent agent supporting human-multi-robot team collaboration , 2015, Artif. Intell..
[40] David L. Roberts,et al. A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans , 2016, AAMAS.
[41] Amos Azaria,et al. Advice Provision for Energy Saving in Automobile Climate-Control System , 2015, AI Mag..
[42] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[43] Tucker R. Balch,et al. Symmetry in Markov Decision Processes and its Implications for Single Agent and Multiagent Learning , 2001, ICML.
[44] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.