Regularized Fitted Q-iteration: Application to Planning
暂无分享,去创建一个
Shie Mannor | Csaba Szepesvári | Mohammad Ghavamzadeh | Amir Massoud Farahmand | Csaba Szepesvari | Shie Mannor | A. Farahmand | M. Ghavamzadeh
[1] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[2] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[3] M. Loth,et al. Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[4] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[5] Shie Mannor,et al. The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.
[6] Ding-Xuan Zhou,et al. Capacity of reproducing kernel spaces in learning theory , 2003, IEEE Transactions on Information Theory.
[7] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[8] Alexander J. Smola,et al. Learning with kernels , 1998 .
[9] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.
[10] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[11] A. Tsybakov,et al. Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.
[12] Shai Ben-David,et al. Learning Bounds for Support Vector Machines with Learned Kernels , 2006, COLT.
[13] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[14] Daniel Polani,et al. Least Squares SVM for Least Squares TD Learning , 2006, ECAI.
[15] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[16] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[17] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[18] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.
[19] Csaba Szepesv,et al. Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory , 2007 .
[20] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[21] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[22] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.