暂无分享,去创建一个
Stuart J. Russell | Stuart Russell | Brandon Amos | Noam Brown | Arnaud Fickinger | Hengyuan Hu | Brandon Amos | Noam Brown | Hengyuan Hu | Arnaud Fickinger
[1] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[2] Rémi Munos,et al. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..
[3] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.
[4] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..
[5] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[6] Adam Lerer,et al. Learned Belief Search: Efficiently Improving Policies in Partially Observable Settings , 2021, ArXiv.
[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[8] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .
[9] Michael H. Bowling,et al. Monte Carlo Tree Search in Continuous Action Spaces with Execution Uncertainty , 2016, IJCAI.
[10] W. Marsden. I and J , 2012 .
[11] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[12] Michal Valko,et al. Monte-Carlo Tree Search as Regularized Policy Optimization , 2020, ICML.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] David Barber,et al. Thinking Fast and Slow with Deep Learning and Tree Search , 2017, NIPS.
[15] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
[16] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[17] N. Whitman. A bitter lesson. , 1999, Academic medicine : journal of the Association of American Medical Colleges.
[18] Jakob N. Foerster,et al. "Other-Play" for Zero-Shot Coordination , 2020, ICML.
[19] Noam Brown,et al. Superhuman AI for multiplayer poker , 2019, Science.
[20] Denis Yarats,et al. The Differentiable Cross-Entropy Method , 2020, ICML.
[21] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[22] P. Alam,et al. H , 1887, High Explosives, Propellants, Pyrotechnics.
[23] David Silver,et al. Online and Offline Reinforcement Learning by Planning with a Learned Model , 2021, NeurIPS.
[24] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[25] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[26] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[27] Jakob N. Foerster,et al. Improving Policies via Search in Cooperative Partially Observable Games , 2019, AAAI.
[28] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[29] Jessica B. Hamrick,et al. Combining Q-Learning and Search with Amortized Value Estimates , 2020, ICLR.
[30] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[31] Jackie Kay,et al. Local Search for Policy Iteration in Continuous Control , 2020, ArXiv.
[32] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[33] P. Alam. ‘G’ , 2021, Composites Engineering: An A–Z Guide.
[34] Tim Salimans,et al. Policy Gradient Search: Online Planning and Expert Iteration without Search Trees , 2019, ArXiv.
[35] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[36] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[37] Alessandro Davide Ialongo,et al. Iterative Amortized Policy Optimization , 2020, NeurIPS.
[38] P. Alam. ‘A’ , 2021, Composites Engineering: An A–Z Guide.
[39] Jimmy Ba,et al. Exploring Model-based Planning with Policy Networks , 2019, ICLR.
[40] Daniel Guo,et al. Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.
[41] H. Francis Song,et al. Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning , 2018, ICML.
[42] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[43] David Silver,et al. Learning and Planning in Complex Action Spaces , 2021, ICML.
[44] Chao Yang,et al. A Survey on Deep Transfer Learning , 2018, ICANN.
[45] Adam Lerer,et al. Combining Deep Reinforcement Learning and Search for Imperfect-Information Games , 2020, NeurIPS.
[46] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[47] Shie Mannor,et al. Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning , 2018, NeurIPS.
[48] Hengyuan Hu,et al. Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning , 2020, ICLR.
[49] Peter I. Cowling,et al. Information Set Monte Carlo Tree Search , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[50] H. Francis Song,et al. The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..
[51] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.