Approximate Dynamic Programming
暂无分享,去创建一个
[1] Olivier Sigaud,et al. Learning the structure of Factored Markov Decision Processes in reinforcement learning problems , 2006, ICML.
[2] Shie Mannor,et al. Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems , 2009, 2009 American Control Conference.
[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[4] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[5] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[6] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[7] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[8] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[9] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[10] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[11] S. Mallat,et al. Adaptive greedy approximations , 1997 .
[12] Rémi Munos,et al. Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation , 2005, J. Mach. Learn. Res..
[13] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[14] David Haussler,et al. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.
[15] Andrew W. Moore,et al. Locally Weighted Learning , 1997, Artificial Intelligence Review.
[16] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[17] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..
[18] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[19] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[20] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[21] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[22] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[23] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[24] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .