暂无分享,去创建一个
[1] S. C. Jaquette. A Utility Criterion for Markov Decision Processes , 1976 .
[2] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[3] Vivek S. Borkar,et al. Q-Learning for Risk-Sensitive Control , 2002, Math. Oper. Res..
[4] M. J. Sobel,et al. Discounted MDP's: distribution functions and exponential utility maximization , 1987 .
[5] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[6] David Heath,et al. Coherent multiperiod risk adjusted values and Bellman’s principle , 2007, Ann. Oper. Res..
[7] Steven I. Marcus,et al. Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes , 1999, Autom..
[8] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.
[9] D. White. Mean, variance, and probabilistic criteria in finite Markov decision processes: A review , 1988 .
[10] Lukasz Stettner,et al. Risk-Sensitive Control of Discrete-Time Markov Processes with Infinite Horizon , 1999, SIAM J. Control. Optim..
[11] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .
[12] J. Michael Steele,et al. Markov Decision Problems Where Means Bound Variances , 2014, Oper. Res..
[13] John N. Tsitsiklis,et al. Algorithmic aspects of mean-variance optimization in Markov decision processes , 2013, Eur. J. Oper. Res..
[14] Susanne Klöppel,et al. DYNAMIC INDIFFERENCE VALUATION VIA CONVEX RISK MEASURES , 2007 .
[15] J. Hiriart-Urruty,et al. Mean value theorems in nonsmooth analysis , 1980 .
[16] Vivek S. Borkar,et al. A sensitivity formula for risk-sensitive cost and the actor-critic algorithm , 2001, Syst. Control. Lett..
[17] Mohammad Ghavamzadeh,et al. Algorithms for CVaR Optimization in MDPs , 2014, NIPS.
[18] S. C. Jaquette. Markov Decision Processes with a New Optimality Criterion: Discrete Time , 1973 .
[19] Andrzej Ruszczynski,et al. Risk measurement and risk-averse control of partially observable discrete-time Markov systems , 2018, Math. Methods Oper. Res..
[20] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[21] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[22] Shie Mannor,et al. Scaling Up Robust MDPs using Function Approximation , 2014, ICML.
[23] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[24] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[25] Darinka Dentcheva,et al. Risk forms: representation, disintegration, and application to partially observable two-stage systems , 2018, Math. Program..
[26] Özlem Çavus,et al. Risk-Averse Control of Undiscounted Transient Markov Models , 2012, SIAM J. Control. Optim..
[27] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[28] W. Fleming,et al. Optimal long term growth rate of expected utility of wealth , 1999 .
[29] Andrzej Ruszczynski,et al. Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..
[30] Wlodzimierz Ogryczak,et al. From stochastic dominance to mean-risk models: Semideviations as risk measures , 1999, Eur. J. Oper. Res..
[31] Leonard Rogers,et al. VALUATIONS AND DYNAMIC CONVEX RISK MEASURES , 2007, 0709.0232.
[32] Klaus Obermayer,et al. A Unified Framework for Risk-sensitive Markov Decision Processes with Finite State and Action Spaces , 2011, ArXiv.
[33] Wann-Jiun Ma,et al. Risk-averse sensor planning using distributed policy gradient , 2017, 2017 American Control Conference (ACC).
[34] A. Ruszczynski,et al. Statistical estimation of composite risk functionals and risk optimization problems , 2015, 1504.02658.
[35] Steven D. Levitt,et al. On Modeling Risk in Markov Decision Processes , 2001 .
[36] H. Föllmer,et al. Convex risk measures and the dynamics of their penalty functions , 2006 .
[37] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[38] R. Bellman. A Markovian Decision Process , 1957 .
[39] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[40] B. Roorda,et al. COHERENT ACCEPTABILITY MEASURES IN MULTIPERIOD MODELS , 2005 .
[41] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[42] L. L. Wegge,et al. Mean value theorem for convex functions , 1974 .
[43] F. Delbaen,et al. Dynamic Monetary Risk Measures for Bounded Discrete-Time Processes , 2004, math/0410453.
[44] Jerzy A. Filar,et al. Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..
[45] Patrick Cheridito,et al. COMPOSITION OF TIME-CONSISTENT DYNAMIC MONETARY RISK MEASURES IN DISCRETE TIME , 2011 .
[46] Shie Mannor,et al. Sequential Decision Making With Coherent Risk , 2017, IEEE Transactions on Automatic Control.
[47] Zhiping Chen,et al. Time-consistent investment policies in Markovian markets: A case of mean–variance analysis , 2014 .
[48] Carlos S. Kubrusly,et al. Stochastic approximation algorithms and applications , 1973, CDC 1973.
[49] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[50] Steven I. Marcus,et al. Dynamic programming with non-convex risk-sensitive measures , 2013, 2013 American Control Conference.
[51] Uriel G. Rothblum,et al. Optimal stopping, exponential utility, and linear programming , 1979, Math. Program..
[52] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[53] Warren B. Powell,et al. Approximate Dynamic Programming for Large-Scale Resource Allocation Problems , 2006 .
[54] G. Pflug,et al. Modeling, Measuring and Managing Risk , 2008 .
[55] A. Ruszczynski,et al. Process-based risk measures and risk-averse control of discrete-time systems , 2014, Math. Program..
[56] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[57] Nicole Bäuerle,et al. More Risk-Sensitive Markov Decision Processes , 2014, Math. Oper. Res..
[58] R. Bellman,et al. Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1963 .
[59] Özlem Çavus,et al. Computational Methods for Risk-Averse Undiscounted Transient Markov Models , 2014, Oper. Res..
[60] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[61] Anna Jaskiewicz,et al. Persistently Optimal Policies in Stochastic Dynamic Programming with Generalized Discounting , 2013, Math. Oper. Res..
[62] Alexander Shapiro,et al. Conditional Risk Mappings , 2005, Math. Oper. Res..
[63] Alexander Shapiro,et al. Optimization of Convex Risk Functions , 2006, Math. Oper. Res..
[64] A. Ruszczynski,et al. Stochastic approximation method with gradient averaging for unconstrained problems , 1983 .
[65] U. Rieder,et al. Markov Decision Processes , 2010 .
[66] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[67] Frank Riedel,et al. Dynamic Coherent Risk Measures , 2003 .
[68] Daniel Hernández-Hernández,et al. Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management , 1999, Math. Methods Oper. Res..
[69] Daniel Hernández-Hernández,et al. Risk Sensitive Markov Decision Processes , 1997 .
[70] W. A. Clark,et al. Simulation of self-organizing systems by digital computer , 1954, Trans. IRE Prof. Group Inf. Theory.
[71] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .