Finite time bounds for sampling based fitted value iteration
暂无分享,去创建一个
[1] R. Bellman,et al. FUNCTIONAL APPROXIMATIONS AND DYNAMIC PROGRAMMING , 1959 .
[2] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[3] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[4] E. Cheney. Introduction to approximation theory , 1966 .
[5] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[6] G. Wahba,et al. Some results on Tchebycheffian spline functions , 1971 .
[7] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .
[8] Norbert Sauer,et al. On the Density of Families of Sets , 1972, J. Comb. Theory A.
[9] Thomas L. Morin,et al. COMPUTATIONAL ADVANCES IN DYNAMIC PROGRAMMING , 1978 .
[10] C. J. Stone,et al. Optimal Rates of Convergence for Nonparametric Estimators , 1980 .
[11] C. J. Stone,et al. Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .
[12] J. Tsitsiklis,et al. An optimal multigrid algorithm for continuous state discrete time stochastic control , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.
[13] John N. Tsitsiklis,et al. The complexity of dynamic programming , 1989, J. Complex..
[14] P. Bougerol,et al. Strict Stationarity of Generalized Autoregressive Processes , 1992 .
[15] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[16] John Rust. Using Randomization to Break the Curse of Dimensionality , 1997 .
[17] M. Talagrand. Sharper Bounds for Gaussian and Empirical Processes , 1994 .
[18] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[19] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[20] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[21] David Haussler,et al. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.
[22] I. Johnstone,et al. Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .
[23] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[24] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[25] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[26] Peter L. Bartlett,et al. Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.
[27] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[28] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[29] R. DeVore,et al. Nonlinear approximation , 1998, Acta Numerica.
[30] O. Linton,et al. The asymptotic distribution of nonparametric estimates of the Lyapunov exponent for stochastic time series , 1999 .
[31] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[32] Peter L. Bartlett,et al. Learning in Neural Networks: Theoretical Foundations , 1999 .
[33] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[34] Thomas G. Dietterich,et al. Efficient Value Function Approximation Using Regression Trees , 1999 .
[35] Federico Girosi,et al. Generalization bounds for function approximation from scattered noisy data , 1999, Adv. Comput. Math..
[36] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .
[37] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[38] Francis A. Longstaff,et al. Valuing American Options by Simulation: A Simple Least-Squares Approach , 2001 .
[39] Bernhard Schölkopf,et al. Learning with kernels , 2001 .
[40] Csaba Szepesvári,et al. Efficient approximate planning in continuous space Markovian Decision Problems , 2001, AI Commun..
[41] Xin Wang,et al. Batch Value Function Approximation via Support Vectors , 2001, NIPS.
[42] Tong Zhang,et al. Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..
[43] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.
[44] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[45] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[46] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[47] M. Haugh. Duality theory and simulation in financial engineering , 2003, Proceedings of the 2003 Winter Simulation Conference, 2003..
[48] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[49] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[50] Abhijit Gosavi,et al. A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis , 2004, Machine Learning.
[51] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[52] Thomas Uthmann,et al. Experiments in Value Function Approximation with Sparse Support Vector Regression , 2004, ECML.
[53] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[54] William D. Smart,et al. Interpolation-based Q-learning , 2004, ICML.
[55] I. Johnstone,et al. Adapting to unknown sparsity by controlling the false discovery rate , 2005, math/0505374.
[56] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[57] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[58] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[59] Susan A. Murphy,et al. A Generalization Error for Q-Learning , 2005, J. Mach. Learn. Res..
[60] A. Krzyżak,et al. Adaptive regression estimation with multilayer feedforward neural networks , 2005 .
[61] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[62] Hao Helen Zhang,et al. Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.
[63] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[64] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[65] Abhijit Gosavi,et al. Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .