暂无分享,去创建一个
[1] Uri Zwick,et al. SOKOBAN and other motion planning problems , 1999, Comput. Geom..
[2] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[3] Lior Rokach,et al. Ensemble-based classifiers , 2010, Artificial Intelligence Review.
[4] Catholijn M. Jonker,et al. Efficient exploration with Double Uncertain Value Networks , 2017, ArXiv.
[5] Csaba Szepesvári,et al. Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.
[6] Jan Willemson,et al. Improved Monte-Carlo Search , 2006 .
[7] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[8] Satinder Singh,et al. Self-Imitation Learning , 2018, ICML.
[9] J. Andrew Bagnell,et al. Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.
[10] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[11] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[12] Andrzej Janusz,et al. Improving Hearthstone AI by Combining MCTS and Supervised Learning Algorithms , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).
[13] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.
[14] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[15] Dale Schuurmans,et al. Striving for Simplicity in Off-policy Deep Reinforcement Learning , 2019, ArXiv.
[16] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[17] Sebastian Tschiatschek,et al. Successor Uncertainties: exploration and uncertainty in temporal difference learning , 2018, NeurIPS.
[18] Kenneth O. Stanley,et al. Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.
[19] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[20] Mohammad Norouzi,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2020, ICML.
[21] Tor Lattimore,et al. Behaviour Suite for Reinforcement Learning , 2019, ICLR.
[22] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.
[23] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.
[24] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.
[25] Sham M. Kakade,et al. Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control , 2018, ICLR.
[26] Razvan Pascanu,et al. Learning model-based planning from scratch , 2017, ArXiv.
[27] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[28] Tom Eccles,et al. An investigation of model-free planning , 2019, ICML.
[29] Pieter Abbeel,et al. Adaptive Online Planning for Continual Lifelong Learning , 2019, ArXiv.
[30] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[31] Michèle Sebag,et al. The grand challenge of computer Go , 2012, Commun. ACM.
[32] Levente Kocsis,et al. Transpositions and move groups in Monte Carlo tree search , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.
[33] Stefanie Tellex,et al. Deep Abstract Q-Networks , 2017, AAMAS.
[34] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[35] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[36] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[37] Laurent Orseau,et al. Single-Agent Policy Tree Search With Guarantees , 2018, NeurIPS.
[38] Richard B. Segal,et al. On the Scalability of Parallel UCT , 2010, Computers and Games.
[39] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.
[40] Pierre Baldi,et al. Solving the Rubik's Cube with Approximate Policy Iteration , 2018, ICLR.
[41] David Barber,et al. Thinking Fast and Slow with Deep Learning and Tree Search , 2017, NIPS.
[42] Samy Bengio,et al. Efficient Exploration with Self-Imitation Learning via Trajectory-Conditioned Policy , 2019, ArXiv.
[43] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[44] Shimon Whiteson,et al. TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning , 2017, ICLR 2018.
[45] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[46] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[47] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[48] Taehoon Kim,et al. Quantifying Generalization in Reinforcement Learning , 2018, ICML.