Optimizing the CVaR via Sampling
暂无分享,去创建一个
[1] Harley Flanders,et al. Differentiation Under the Integral Sign , 1973 .
[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[3] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[4] Dirk P. Kroese,et al. Simulation and the Monte Carlo Method (Wiley Series in Probability and Statistics) , 1981 .
[5] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[6] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[7] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[8] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .
[9] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .
[10] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[11] Vivek S. Borkar,et al. A sensitivity formula for risk-sensitive cost and the actor-critic algorithm , 2001, Syst. Control. Lett..
[12] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[13] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[14] C. Acerbi. Spectral measures of risk: A coherent representation of subjective risk aversion , 2002 .
[15] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[16] O. Scaillet. Nonparametric Estimation and Sensitivity Analysis of Expected Shortfall , 2004 .
[17] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[18] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[19] V. Agarwal,et al. Risks and Portfolio Decisions Involving Hedge Funds , 2004 .
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[22] Gilles Pagès,et al. Computing VaR and CVaR using stochastic approximation and adaptive unconstrained importance sampling , 2008, Monte Carlo Methods Appl..
[23] L. Jeff Hong,et al. Simulating Sensitivities of Conditional Value at Risk , 2009, Manag. Sci..
[24] Bruno Scherrer,et al. Improvements on Learning Tetris with Cross Entropy , 2009, J. Int. Comput. Games Assoc..
[25] Masashi Sugiyama,et al. Nonparametric Return Distribution Approximation for Reinforcement Learning , 2010, ICML.
[26] Vivek S. Borkar,et al. Risk-constrained Markov decision processes , 2010, 49th IEEE Conference on Decision and Control (CDC).
[27] P. Glynn. IMPORTANCE SAMPLING FOR MONTE CARLO ESTIMATION OF QUANTILES , 2011 .
[28] Nicole Bäuerle,et al. Markov Decision Processes with Average-Value-at-Risk criteria , 2011, Math. Methods Oper. Res..
[29] D. Barber,et al. A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes , 2012, NIPS.
[30] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[31] Garud Iyengar,et al. Fast gradient descent method for Mean-CVaR optimization , 2013, Ann. Oper. Res..
[32] Mohammad Ghavamzadeh,et al. Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.
[33] Bruno Scherrer,et al. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris , 2013, NIPS.
[34] Prashanth L.A,et al. Policy Gradients for CVaR-Constrained MDPs , 2014, 1405.2690.
[35] A. PrashanthL.. Policy Gradients for CVaR-Constrained MDPs , 2014, ALT.