暂无分享,去创建一个
[1] John Langford,et al. Relating reinforcement learning performance to classification performance , 2005, ICML '05.
[2] William B. Haskell,et al. Empirical Dynamic Programming , 2013, Math. Oper. Res..
[3] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[4] S. Resnick. A Probability Path , 1999 .
[5] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.
[6] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.
[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[8] Tristan Cazenave,et al. Nested Monte-Carlo Search , 2009, IJCAI.
[9] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.
[10] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[11] Nataliya Sokolovska,et al. Continuous Upper Confidence Trees , 2011, LION.
[12] Warren B. Powell,et al. Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds , 2017, ArXiv.
[13] Wouter M. Koolen,et al. Monte-Carlo Tree Search by Best Arm Identification , 2017, NIPS.
[14] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..
[15] Pieter Spronck,et al. Monte-Carlo Tree Search: A New Framework for Game AI , 2008, AIIDE.
[16] Rémi Munos,et al. Adaptive play in Texas Hold'em Poker , 2008, ECAI.
[17] Tamás Linder,et al. On the Asymptotic Optimality of Finite Approximations to Markov Decision Processes with Borel Spaces , 2015, Math. Oper. Res..
[18] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[19] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[20] Markus Enzenberger,et al. Evaluation in Go by a Neural Network using Soft Segmentation , 2003, ACG.
[21] Michèle Sebag,et al. Continuous Rapid Action Value Estimates , 2011, ACML 2011.
[22] Bruno Bouzy,et al. Monte-Carlo strategies for computer Go , 2006 .
[23] Eiji Takimoto,et al. Efficient Sampling Method for Monte Carlo Tree Search Problem , 2014, IEICE Trans. Inf. Syst..
[24] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[25] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[26] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[27] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[28] Benjamin Van Roy. Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..
[29] Michèle Sebag,et al. The grand challenge of computer Go , 2012, Commun. ACM.
[30] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[31] Philip Hingston,et al. Experiments with Monte Carlo Othello , 2007, 2007 IEEE Congress on Evolutionary Computation.
[32] Vadim Bulitko,et al. Focus of Attention in Reinforcement Learning , 2007, J. Univers. Comput. Sci..
[33] Warren B. Powell,et al. The Information-Collecting Vehicle Routing Problem: Stochastic Optimization for Emergency Storm Response , 2016, ArXiv.
[34] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[35] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[36] David Silver,et al. Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..
[37] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[38] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[39] F. Dufour,et al. Approximation of Markov decision processes with general state space , 2012 .
[40] D. Pollard. Empirical Processes: Theory and Applications , 1990 .
[41] D. Bertsekas. Convergence of discretization procedures in dynamic programming , 1975 .
[42] Jean Méhat,et al. Combining UCT and Nested Monte Carlo Search for Single-Player General Game Playing , 2010, IEEE Transactions on Computational Intelligence and AI in Games.
[43] Rahul Jain,et al. An Empirical Dynamic Programming Algorithm for Continuous MDPs , 2017, 1709.07506.
[44] David Barber,et al. Thinking Fast and Slow with Deep Learning and Tree Search , 2017, NIPS.
[45] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[46] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..
[47] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.