暂无分享,去创建一个
R. Srikant | Joseph Lubars | Michael Livesay | Anna Winnicki | R. Srikant | Anna Winnicki | Joseph Lubars | Michael Livesay
[1] Dimitri Bertsekas. Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control , 2021, ArXiv.
[2] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[3] Catholijn M. Jonker,et al. A Framework for Reinforcement Learning and Planning , 2020, ArXiv.
[4] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[5] Jackie Kay,et al. Local Search for Policy Iteration in Continuous Control , 2020, ArXiv.
[6] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[7] Shie Mannor,et al. Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning , 2018, NIPS 2018.
[8] D. Bertsekas. Reinforcement Learning and Optimal ControlA Selective Overview , 2018 .
[9] Andrew Tridgell,et al. TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search , 1999, ArXiv.
[10] Shie Mannor,et al. Beyond the One Step Greedy Approach in Reinforcement Learning , 2018, ICML.
[11] Devavrat Shah,et al. Non-Asymptotic Analysis of Monte Carlo Tree Search , 2019 .
[12] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[13] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[14] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.
[15] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.
[16] Shie Mannor,et al. Online Planning with Lookahead Policies , 2020, NeurIPS.
[17] Joel Veness,et al. Bootstrapping from Game Tree Search , 2009, NIPS.
[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[19] Bruno Scherrer,et al. Non-Stationary Approximate Modified Policy Iteration , 2015, ICML.
[20] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[21] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[22] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[23] Shiqun Yin,et al. Value-based Algorithms Optimization with Discounted Multiple-step Learning Method in Deep Reinforcement Learning , 2020, 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[24] Shie Mannor,et al. How to Combine Tree-Search Methods in Reinforcement Learning , 2018, AAAI.
[25] Mohammad Ghavamzadeh,et al. Multi-step Greedy Reinforcement Learning Algorithms , 2020, ICML.
[26] Nathan R. Sturtevant,et al. Monte Carlo Tree Search with heuristic evaluations using implicit minimax backups , 2014, 2014 IEEE Conference on Computational Intelligence and Games.
[27] John N. Tsitsiklis,et al. On the Convergence of Optimistic Policy Iteration , 2002, J. Mach. Learn. Res..
[28] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.