论文信息 - Episodic Exploration for Deep Deterministic Policies for StarCraft Micromanagement - 字舞流文

Episodic Exploration for Deep Deterministic Policies for StarCraft Micromanagement

Nicolas Usunier | Soumith Chintala | Gabriel Synnaeve | Zeming Lin | Soumith Chintala | Zeming Lin | Nicolas Usunier | Gabriel Synnaeve

[1] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[2] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[3] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[4] Siming Liu,et al. Evolving effective micro behaviors in RTS game , 2014, 2014 IEEE Conference on Computational Intelligence and Games.

[5] Santiago Ontañón,et al. A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[6] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[7] Ian D. Watson,et al. Applying reinforcement learning to small scale combat in the real-time strategy game StarCraft:Broodwar , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[8] Michael Buro,et al. Fast Heuristic Search for RTS Game Combat Scenarios , 2012, AIIDE.

[9] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[10] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.

[11] Frank Sehnke,et al. Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.

[12] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13] Sylvain Gelly,et al. Exploration exploitation in Go: UCT for Monte-Carlo Go , 2006, NIPS 2006.

[14] Bhaskara Marthi,et al. Concurrent Hierarchical Reinforcement Learning , 2005, IJCAI.

[15] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17] Gerald Tesauro,et al. Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[18] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[19] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[20] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[21] Manuela M. Veloso,et al. Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[22] James C. Spall,et al. A one-measurement form of simultaneous perturbation stochastic approximation , 1997, Autom..

[23] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[24] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[25] Ming Tan,et al. Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[26] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .