An Empirical Dynamic Programming Algorithm for Continuous MDPs
暂无分享,去创建一个
[1] Pengqian Yu,et al. Randomized function fitting-based empirical value iteration , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).
[2] William B. Haskell,et al. Empirical Dynamic Programming , 2013, Math. Oper. Res..
[3] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[4] Vivek F. Farias,et al. Non-parametric Approximate Dynamic Programming via the Kernel Method , 2012, NIPS.
[5] Guy Lever,et al. Modelling transition dynamics in MDPs with RKHS embeddings , 2012, ICML.
[6] Jan Peters,et al. Policy Gradient Methods , 2010, Encyclopedia of Machine Learning.
[7] Pravin Varaiya,et al. Simulation-based optimization of Markov decision processes: An empirical process theory approach , 2010, Autom..
[8] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control 3rd Edition, Volume II , 2010 .
[9] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[10] Elizabeth L. Wilmer,et al. Markov Chains and Mixing Times , 2008 .
[11] Benjamin Recht,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.
[12] A. Rahimi,et al. Uniform approximation of functions with random bases , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.
[13] Stochastic Orders , 2008 .
[14] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[15] AI Koan,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.
[16] Warren B. Powell,et al. Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .
[17] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[18] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[19] S. Smale,et al. Shannon sampling II: Connections to learning theory , 2005 .
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] Dimitri P. Bertsekas,et al. Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC , 2005, Eur. J. Control.
[22] Benjamin Van Roy,et al. On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..
[23] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[24] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[25] Csaba Szepesvári,et al. Efficient approximate planning in continuous space Markovian Decision Problems , 2001, AI Commun..
[26] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[27] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[28] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.
[29] R. DeVore,et al. Nonlinear approximation , 1998, Acta Numerica.
[30] David Haussler,et al. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.
[31] John Rust. Using Randomization to Break the Curse of Dimensionality , 1997 .
[32] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .