暂无分享,去创建一个
Nicolas Usunier | Soumith Chintala | Gabriel Synnaeve | Zeming Lin | Soumith Chintala | Zeming Lin | Nicolas Usunier | Gabriel Synnaeve
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .
[3] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[4] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[5] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[6] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[7] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[8] James C. Spall,et al. A one-measurement form of simultaneous perturbation stochastic approximation , 1997, Autom..
[9] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.
[10] Manuela M. Veloso,et al. Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.
[11] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[12] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.
[13] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.
[14] Gerald Tesauro,et al. Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.
[15] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[16] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[18] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[19] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.
[20] Sylvain Gelly,et al. Exploration exploitation in Go: UCT for Monte-Carlo Go , 2006, NIPS 2006.
[21] Sridhar Mahadevan,et al. Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.
[22] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[23] Frank Sehnke,et al. Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.
[24] Ludovic Denoyer,et al. Structured prediction with reinforcement learning , 2009, Machine Learning.
[25] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[26] Tom Schaul,et al. Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.
[27] Gabriel Synnaeve,et al. A Bayesian model for RTS units control applied to StarCraft , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).
[28] Ian D. Watson,et al. Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft:Broodwar , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).
[29] Michael Buro,et al. Fast Heuristic Search for RTS Game Combat Scenarios , 2012, AIIDE.
[30] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[31] Santiago Ontañón,et al. A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.
[32] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[33] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[34] Siming Liu,et al. Evolving effective micro behaviors in RTS game , 2014, 2014 IEEE Conference on Computational Intelligence and Games.
[35] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[36] Martin J. Wainwright,et al. Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.
[37] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[38] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[39] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[40] Fernando Diaz,et al. Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains , 2016, ArXiv.
[41] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[42] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.
[43] Honglak Lee,et al. Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.
[44] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[45] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.