Hybrid Online and Offline Reinforcement Learning for Tibetan Jiu Chess
暂无分享,去创建一个
Xiali Li | Licheng Wu | Yue Zhao | Xiaona Xu | Zhengyu Lv
[1] Ernesto Estrada,et al. Path Laplacian operators and superdiffusive processes on graphs. II. Two-dimensional lattice , 2018, Linear Algebra and its Applications.
[2] Ivan Bratko,et al. Pattern-Based Representation of Chess End-Game Knowledge , 1978, Comput. J..
[3] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[4] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[5] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[6] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[7] Murray Campbell,et al. Deep Blue , 2002, Artif. Intell..
[8] Tuomas Sandholm,et al. Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.
[9] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[10] Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.
[11] Donald F. Beal,et al. First Results from Using Temporal Difference Learning in Shogi , 1998, Computers and Games.
[12] Mark H. M. Winands,et al. Comparing Randomization Strategies for Search-Control Parameters in Monte-Carlo Tree Search , 2019, 2019 IEEE Conference on Games (CoG).
[13] Jiahui Bai,et al. On the Observability of Leader-Based Multiagent Systems with Fixed Topology , 2019, Complex..
[14] Song Wang,et al. A Reinforcement Learning Model Based on Temporal Difference Algorithm , 2019, IEEE Access.
[15] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[16] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[17] Hitoshi Matsubara,et al. Pattern Recognition for Candidate Generation in the Game of Shogi , 1997 .
[18] Xia Chen,et al. A Stochastic Sampling Mechanism for Time-Varying Formation of Multiagent Systems With Multiple Leaders and Communication Delays , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[19] Sylvain Gelly,et al. Modifications of UCT and sequence-like simulations for Monte-Carlo Go , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.
[20] Song Wang,et al. Strategy research based on chess shapes for Tibetan JIU computer game , 2018, J. Int. Comput. Games Assoc..
[21] Stephen Tyree,et al. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.
[22] Shi-Jim Yen,et al. Pattern Matching in Go Game Records , 2007, Second International Conference on Innovative Computing, Informatio and Control (ICICIC 2007).
[23] Song Wang,et al. A Middle Game Search Algorithm Applicable to Low-Cost Personal Computer for Go , 2019, IEEE Access.
[24] Mark H. M. Winands,et al. Real-Time Monte Carlo Tree Search in Ms Pac-Man , 2014, IEEE Transactions on Computational Intelligence and AI in Games.
[25] Rafal Bogacz,et al. Learning to use working memory: a reinforcement learning gating model of rule acquisition in rats , 2012, Front. Comput. Neurosci..
[26] Song Guo,et al. Information and Communications Technologies for Sustainable Development Goals: State-of-the-Art, Needs and Perspectives , 2018, IEEE Communications Surveys & Tutorials.
[27] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[28] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Thore Graepel,et al. Bayesian pattern ranking for move prediction in the game of Go , 2006, ICML.
[30] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[31] Scott D. Goodwin,et al. Knowledge Generation for Improving Simulations in UCT for General Game Playing , 2008, Australasian Conference on Artificial Intelligence.
[32] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[33] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[34] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.
[35] Song Guo,et al. Big Data Meet Green Challenges: Big Data Toward Green Applications , 2016, IEEE Systems Journal.
[36] Sylvain Gelly,et al. Exploration exploitation in Go: UCT for Monte-Carlo Go , 2006, NIPS 2006.
[37] Daisuke Takahashi,et al. A Shogi Program Based on Monte-Carlo Tree Search , 2010, J. Int. Comput. Games Assoc..
[38] Sebastian Thrun,et al. Learning to Play the Game of Chess , 1994, NIPS.
[39] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[40] Yaakov HaCohen-Kerner. Learning Strategies for Explanation Patterns: Basic Game Patterns with Application to Chess , 1995, ICCBR.
[41] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[42] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[43] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[44] Razvan Pascanu,et al. Relational Deep Reinforcement Learning , 2018, ArXiv.
[45] Gao Long. Gradient descent Sarsa(λ) algorithm based on the adaptive potential function shaping reward mechanism , 2013 .
[46] Xianfu Chen,et al. Energy-Efficiency Oriented Traffic Offloading in Wireless Networks: A Brief Survey and a Learning Approach for Heterogeneous Cellular Networks , 2015, IEEE Journal on Selected Areas in Communications.
[47] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[48] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.