A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions
暂无分享,去创建一个
[1] L. S. Pontryagin,et al. Mathematical Theory of Optimal Processes , 1962 .
[2] E. Blum,et al. The Mathematical Theory of Optimal Processes. , 1963 .
[3] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[4] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[5] G. Barles,et al. Exit Time Problems in Optimal Control and Vanishing Viscosity Method , 1988 .
[6] Marianne Akian. Méthodes multigrilles en contrôle stochastique , 1990 .
[7] Richard S. Sutton,et al. Neural networks for control , 1990 .
[8] G. Barles,et al. Comparison principle for dirichlet-type Hamilton-Jacobi equations and singular perturbations of degenerated elliptic equations , 1990 .
[9] G. Barles,et al. Convergence of approximation schemes for fully nonlinear second order equations , 1990, 29th IEEE Conference on Decision and Control.
[10] Andrew G. Barto,et al. Connectionist learning for control: an overview , 1990 .
[11] A. Moore. Variable Resolution Dynamic Programming , 1991, ML.
[12] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[13] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[14] W. Fleming,et al. Controlled Markov processes and viscosity solutions , 1992 .
[15] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .
[16] P. Lions,et al. User’s guide to viscosity solutions of second order partial differential equations , 1992, math/9207212.
[17] Richard S. Sutton,et al. Online Learning with Random Representations , 1993, ICML.
[18] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.
[19] Eduardo D. Sontag,et al. Neural Networks for Control , 1993 .
[20] G. Barles. Solutions de viscosité des équations de Hamilton-Jacobi , 1994 .
[21] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[22] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[23] J. Quadrat. Numerical methods for stochastic control problems in continuous time , 1994 .
[24] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[25] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[26] Kenji Doya,et al. Temporal Difference Learning in Continuous Time and Space , 1995, NIPS.
[27] A. Harry Klopf,et al. Reinforcement Learning Applied to a Differential Game , 1995, Adapt. Behav..
[28] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[29] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[30] 김재현,et al. Fuzzy-Q learning , 1996 .
[31] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[32] Rémi Munos,et al. A Convergent Reinforcement Learning Algorithm in the Continuous Case: The Finite-Element Reinforcement Learning , 1996, ICML.
[33] Stephan Pareigis,et al. Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes , 1996, NIPS.
[34] Nicolas Meuleau. Le dilemme entre exploration et exploitation dans l'apprentissage par renforcement : optimisation adaptative des modeles de decision multi-etats , 1996 .
[35] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[36] M. Hasselmo,et al. Temporal Diierence Learning in Continuous Time and Space , 1996 .
[37] Rémi Munos,et al. Finite-Element Methods with Local Triangulation Refinement for Continuous Reimforcement Learning Problems , 1997, ECML.
[38] Hugues Bersini,et al. A simplification of the backpropagation-through-time algorithm for optimal neurocontrol , 1997, IEEE Trans. Neural Networks.
[39] Rémi Munos,et al. Reinforcement Learning for Continuous Stochastic Control Problems , 1997, NIPS.
[40] R. Emi Munos,et al. A Convergent Reinforcement Learning Algorithm in the Continuous Case : the Finite-element Reinforcement Learning , 1997 .
[41] Stephan Pareigis,et al. Adaptive Choice of Grid and Time in Reinforcement Learning , 1997, NIPS.
[42] Rémi Munos. Apprentissage par renforcement, étude du cas continu , 1997 .
[43] Rémi Munos,et al. A Convergent Reinforcement Learning Algorithm in the Continuous Case Based on a Finite Difference Method , 1997, IJCAI.
[44] Andrew W. Moore,et al. Barycentric Interpolators for Continuous Space and Time Reinforcement Learning , 1998, NIPS.
[45] P. Dupuis,et al. Rates of Convergence for Approximation Schemes in Optimal Control , 1998 .
[46] Rémi Munos,et al. A General Convergence Method for Reinforcement Learning in the Continuous Case , 1998, ECML.
[47] Andrew W. Moore,et al. Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.
[48] Andrew W. Moore,et al. Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).
[49] H. Kushner. Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .
[50] M. K. Ali,et al. Fuzzy Reinforcement Learning , 2002, Fuzzy Logic Theory and Applications.
[51] Andrew W. Moore,et al. The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces , 1993, Machine Learning.
[52] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[53] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[54] Rmi Munos. Finite-Element Methods with Local Triangulation Refinement for Continuous Reinforcement Learning Problems , 2005 .
[55] .. Griebel. Adaptive sparse grid multilevel methods for ellipticPDEs based on nite di erencesM , .