Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games
暂无分享,去创建一个
Honglak Lee | Richard L. Lewis | Xiaoxiao Guo | Satinder P. Singh | Satinder Singh | Honglak Lee | Xiaoxiao Guo
[1] Jürgen Schmidhuber,et al. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[2] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[3] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[4] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[5] Daphne Koller,et al. Making Rational Decisions Using Adaptive Utility Elicitation , 2000, AAAI/IAAI.
[6] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[7] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[8] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[9] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..
[10] Michael L. Littman,et al. Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.
[11] Levente Kocsis,et al. Transpositions and move groups in Monte Carlo tree search , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.
[12] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.
[13] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.
[14] Richard L. Lewis,et al. Reward Design via Online Gradient Ascent , 2010, NIPS.
[15] Richard L. Lewis,et al. Internal Rewards Mitigate Agent Boundedness , 2010, ICML.
[16] David Silver,et al. Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..
[17] Marc G. Bellemare,et al. Investigating Contingency Awareness Using Atari 2600 Games , 2012, AAAI.
[18] Marc G. Bellemare,et al. Sketch-Based Linear Value Function Approximation , 2012, NIPS.
[19] Marc G. Bellemare,et al. Bayesian Learning of Recursively Factored Environments , 2013, ICML.
[20] Thomas G. Dietterich,et al. State Aggregation in Monte Carlo Tree Search , 2014, AAAI.
[21] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.
[22] Nan Jiang,et al. Improving UCT planning via approximate homomorphisms , 2014, AAMAS.
[23] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[24] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[25] Erik Talvitie,et al. Improving Exploration in UCT Using Local Manifolds , 2015, AAAI.
[26] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[27] Hector Geffner,et al. Classical Planning with Simulators: Results on the Atari Video Games , 2015, IJCAI.
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[29] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[30] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[31] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.