Risk-Sensitive Reinforcement Learning
暂无分享,去创建一个
[1] J. Neumann,et al. Theory of games and economic behavior , 1945, 100 Years of Math Milestones.
[2] Stuart E. Dreyfus,et al. Applied Dynamic Programming , 1965 .
[3] R. Howard,et al. Risk-Sensitive Markov Decision Processes , 1972 .
[4] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[5] E. Elton. Modern portfolio theory and investment analysis , 1981 .
[6] C. Watkins. Learning from delayed rewards , 1989 .
[7] J. Pratt. RISK AVERSION IN THE SMALL AND IN THE LARGE11This research was supported by the National Science Foundation (grant NSF-G24035). Reproduction in whole or in part is permitted for any purpose of the United States Government. , 1964 .
[8] Reid G. Simmons,et al. Risk-Sensitive Planning with Probabilistic Decision Graphs , 1994, KR.
[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[10] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[11] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[12] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[13] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.
[14] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[15] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[16] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[17] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[18] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[19] John N. Tsitsiklis,et al. Approximate Solutions to Optimal Stopping Problems , 1996, NIPS.
[20] T. Basar,et al. H∞-0ptimal Control and Related Minimax Design Problems: A Dynamic Game Approach , 1996, IEEE Trans. Autom. Control..
[21] Ralph Neuneier,et al. Enhancing Q-Learning for Optimal Asset Allocation , 1997, NIPS.
[22] Csaba Szepesvári. Non-Markovian Policies in Sequential Decision Problems , 1998, Acta Cybern..
[23] S. Marcus,et al. Risk-Sensitive, Minimax, and Mixed Risk-Neutral/Minimax Control of Markov Decision Processes , 1999 .
[24] John N. Tsitsiklis,et al. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..
[25] John N. Tsitsiklis,et al. Call admission control and routing in integrated services networks using neuro-dynamic programming , 2000, IEEE Journal on Selected Areas in Communications.
[26] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.