Batch mode reinforcement learning based on the synthesis of artificial trajectories
暂无分享,去创建一个
Louis Wehenkel | Susan A. Murphy | Damien Ernst | Raphaël Fonteneau | D. Ernst | L. Wehenkel | S. Murphy | R. Fonteneau
[1] J. Robins. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect , 1986 .
[2] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[3] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[4] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[5] Leslie Pack Kaelbling,et al. Recent Advances in Reinforcement Learning , 1996, Springer US.
[6] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[7] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[8] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .
[9] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[10] J M Robins,et al. Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.
[11] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[12] S. Murphy,et al. Optimal dynamic treatment regimes , 2003 .
[13] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[14] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[15] Pierre Geurts,et al. Iteratively Extending Time Horizon Reinforcement Learning , 2003, ECML.
[16] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[17] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[18] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[19] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[20] Louis Wehenkel,et al. Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.
[21] Raphaël Marée,et al. Reinforcement Learning with Raw Image Pixels as Input State , 2006, IWICPAS.
[22] Raphaël Marée,et al. Reinforcement learning with raw image pixels as state input , 2006 .
[23] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[24] Daniele de Rigo,et al. Neuro-dynamic programming for designing water reservoir network management policies , 2007 .
[25] Andrew Zisserman,et al. Advances in Neural Information Processing Systems (NIPS) , 2007 .
[26] S. Timmer,et al. Fitted Q Iteration with CMACs , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[27] Joelle Pineau,et al. Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning , 2008, AAAI.
[28] Louis Wehenkel,et al. Risk-aware decision making and dynamic programming , 2008 .
[29] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[30] Andrea Bonarini,et al. Batch Reinforcement Learning for Controlling a Mobile Wheeled Pendulum Robot , 2008, IFIP AI.
[31] B. Chakraborty. Bias Correction and Confidence Intervals for Fitted Q-iteration , 2008 .
[32] Louis Wehenkel,et al. Inferring bounds on the performance of a control policy from a sample of trajectories , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[33] Shie Mannor,et al. Regularized Fitted Q-iteration: Application to Planning , 2008, EWRL.
[34] Max Bramer,et al. Artificial Intelligence in Theory and Practice II , 2009 .
[35] Sergio M. Savaresi,et al. Batch Reinforcement Learning for semi-active suspension control , 2009, 2009 IEEE Control Applications, (CCA) & Intelligent Control, (ISIC).
[36] Louis Wehenkel,et al. Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[37] Louis Wehenkel,et al. Model-Free Monte Carlo-like Policy Evaluation , 2010, AISTATS.
[38] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[39] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.
[40] Marcello Restelli,et al. Tree‐based reinforcement learning for optimal water reservoir operation , 2010 .
[41] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[42] Louis Wehenkel,et al. Generating Informative Trajectories by using Bounds on the Return of Control Policies , 2010 .
[43] Martin A. Riedmiller,et al. Deep learning of visual control policies , 2010, ESANN.
[44] Louis Wehenkel,et al. A Cautious Approach to Generalization in Reinforcement Learning , 2010, ICAART.
[45] Louis Wehenkel,et al. Towards Min Max Generalization in Reinforcement Learning , 2010, ICAART.
[46] Raphaël Fonteneau,et al. Contributions to Batch Mode Reinforcement Learning , 2011 .
[47] Olivier Pietquin,et al. Batch reinforcement learning for optimizing longitudinal driving assistance strategies , 2011, 2011 IEEE Symposium on Computational Intelligence in Vehicles and Transportation Systems (CIVTS) Proceedings.
[48] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..