Finite-Time Bounds for Fitted Value Iteration
暂无分享,去创建一个
[1] R. Bellman,et al. FUNCTIONAL APPROXIMATIONS AND DYNAMIC PROGRAMMING , 1959 .
[2] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[3] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[4] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[5] G. Wahba,et al. Some results on Tchebycheffian spline functions , 1971 .
[6] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .
[7] Norbert Sauer,et al. On the Density of Families of Sets , 1972, J. Comb. Theory A.
[8] Thomas L. Morin,et al. COMPUTATIONAL ADVANCES IN DYNAMIC PROGRAMMING , 1978 .
[9] C. J. Stone,et al. Optimal Rates of Convergence for Nonparametric Estimators , 1980 .
[10] C. J. Stone,et al. Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .
[11] J. Tsitsiklis,et al. An optimal multigrid algorithm for continuous state discrete time stochastic control , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.
[12] John N. Tsitsiklis,et al. The complexity of dynamic programming , 1989, J. Complex..
[13] P. Bougerol,et al. Strict Stationarity of Generalized Autoregressive Processes , 1992 .
[14] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[15] John Rust. Using Randomization to Break the Curse of Dimensionality , 1997 .
[16] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[17] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[18] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[19] David Haussler,et al. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.
[20] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[21] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[22] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[23] Peter L. Bartlett,et al. Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.
[24] John Rust. Numerical dynamic programming in economics , 1996 .
[25] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[26] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[27] R. DeVore,et al. Nonlinear approximation , 1998, Acta Numerica.
[28] Alexander J. Smola,et al. Learning with kernels , 1998 .
[29] Peter L. Bartlett,et al. Learning in Neural Networks: Theoretical Foundations , 1999 .
[30] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[31] Thomas G. Dietterich,et al. Efficient Value Function Approximation Using Regression Trees , 1999 .
[32] Federico Girosi,et al. Generalization bounds for function approximation from scattered noisy data , 1999, Adv. Comput. Math..
[33] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .
[34] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[35] John N. Tsitsiklis,et al. Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.
[36] Francis A. Longstaff,et al. Valuing American Options by Simulation: A Simple Least-Squares Approach , 2001 .
[37] Tong Zhang,et al. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..
[38] Csaba Szepesvári,et al. Efficient approximate planning in continuous space Markovian Decision Problems , 2001, AI Commun..
[39] Xin Wang,et al. Batch Value Function Approximation via Support Vectors , 2001, NIPS.
[40] Tong Zhang,et al. Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..
[41] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.
[42] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[43] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[44] Martin B. Haugh,et al. New simulation methodology for finance: duality theory and simulation in financial engineering , 2003, WSC '03.
[45] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[46] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[47] P. Sánchez,et al. DUALITY THEORY AND SIMULATION IN FINANCIAL ENGINEERING , 2003 .
[48] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[49] Abhijit Gosavi,et al. A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis , 2004, Machine Learning.
[50] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[51] Thomas Uthmann,et al. Experiments in Value Function Approximation with Sparse Support Vector Regression , 2004, ECML.
[52] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[53] William D. Smart,et al. Interpolation-based Q-learning , 2004, ICML.
[54] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[55] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[56] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[57] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[58] Susan A. Murphy,et al. A Generalization Error for Q-Learning , 2005, J. Mach. Learn. Res..
[59] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[60] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[61] A. Antos,et al. Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[62] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[63] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[64] Abhijit Gosavi,et al. Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .