Boosted Fitted Q-Iteration
暂无分享,去创建一个
Marcello Restelli | Carlo D'Eramo | Matteo Pirotta | Samuele Tosatto | Matteo Pirotta | Marcello Restelli | Samuele Tosatto | Carlo D'Eramo
[1] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[2] R. Tibshirani,et al. Generalized Additive Models , 1991 .
[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[4] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[5] Matthew Saffell,et al. Reinforcement Learning for Trading , 1998, NIPS.
[6] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[7] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[8] J. Wellner,et al. Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes , 2000 .
[9] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .
[10] L. Györfi,et al. A Distribution-Free Theory of Nonparametric Regression (Springer Series in Statistics) , 2002 .
[11] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[12] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[13] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Pierre Geurts,et al. Extremely randomized trees , 2006, Machine Learning.
[16] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[17] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.
[18] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..
[19] Peter Buhlmann,et al. BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.
[20] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[21] Shie Mannor,et al. Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems , 2009, 2009 American Control Conference.
[22] Alessandro Lazaric,et al. Finite-sample Analysis of Bellman Residual Minimization , 2010, ACML.
[23] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[24] Csaba Szepesvari,et al. Regularization in reinforcement learning , 2011 .
[25] Doina Precup,et al. Value Pursuit Iteration , 2012, NIPS.
[26] Joelle Pineau,et al. Bellman Error Based Feature Generation using Random Projections on Sparse Spaces , 2013, NIPS.
[27] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[28] Matthieu Geist,et al. Boosted Bellman Residual Minimization Handling Expert Demonstrations , 2014, ECML/PKDD.
[29] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[30] Fernando Diaz,et al. Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains , 2016, ArXiv.
[31] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[32] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.