Episodic Exploration for Deep Deterministic Policies for StarCraft Micromanagement
暂无分享,去创建一个
Nicolas Usunier | Soumith Chintala | Gabriel Synnaeve | Zeming Lin | Soumith Chintala | Zeming Lin | Nicolas Usunier | Gabriel Synnaeve
[1] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[2] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[3] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[4] Siming Liu,et al. Evolving effective micro behaviors in RTS game , 2014, 2014 IEEE Conference on Computational Intelligence and Games.
[5] Santiago Ontañón,et al. A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.
[6] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[7] Ian D. Watson,et al. Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft:Broodwar , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).
[8] Michael Buro,et al. Fast Heuristic Search for RTS Game Combat Scenarios , 2012, AIIDE.
[9] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[10] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.
[11] Frank Sehnke,et al. Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.
[12] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[13] Sylvain Gelly,et al. Exploration exploitation in Go: UCT for Monte-Carlo Go , 2006, NIPS 2006.
[14] Bhaskara Marthi,et al. Concurrent Hierarchical Reinforcement Learning , 2005, IJCAI.
[15] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[17] Gerald Tesauro,et al. Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.
[18] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.
[19] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.
[20] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[21] Manuela M. Veloso,et al. Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.
[22] James C. Spall,et al. A one-measurement form of simultaneous perturbation stochastic approximation , 1997, Autom..
[23] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[24] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[25] Ming Tan,et al. Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.
[26] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .