暂无分享,去创建一个
Michael L. Littman | Jun Ki Lee | Dilip Arumugam | Sophie Saskin | M. Littman | Dilip Arumugam | Jun Ki Lee | S. Saskin
[1] Andrea Lockerd Thomaz,et al. Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.
[2] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[3] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[4] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.
[5] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[6] Geoffrey E. Hinton. Reducing the Dimensionality of Data with Neural , 2008 .
[7] Wolfgang Ertel,et al. Reinforcement learning combined with human feedback in continuous state and action spaces , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[10] Jürgen Schmidhuber,et al. Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.
[11] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[12] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[13] Peter Stone,et al. Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.
[14] Ufuk Topcu,et al. Environment-Independent Task Specifications via GLTL , 2017, ArXiv.
[15] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[16] Craig Boutilier,et al. Rewarding Behaviors , 1996, AAAI/IAAI, Vol. 2.
[17] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[18] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[19] Maya Cakmak,et al. Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..
[20] Peter Stone,et al. Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.
[21] Peter Stone,et al. Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces , 2017, AAAI.
[22] Honglak Lee,et al. Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.
[23] R. French. Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.
[24] Shie Mannor,et al. A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.
[25] Katja Hofmann,et al. The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.
[26] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[27] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.
[28] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[29] David L. Roberts,et al. Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.
[30] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[31] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.
[32] Stefanie Tellex,et al. Minecraft as an Experimental World for AI in Robotics , 2015, AAAI Fall Symposia.
[33] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[34] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[35] P. Stone,et al. TAMER: Training an Agent Manually via Evaluative Reinforcement , 2008, 2008 7th IEEE International Conference on Development and Learning.
[36] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[37] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.