Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
暂无分享,去创建一个
Shimon Whiteson | Shangtong Zhang | Bo Liu | S. Whiteson | Shangtong Zhang | Bo Liu | Shimon Whiteson
[1] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[2] W. Sharpe,et al. Mean-Variance Analysis in Portfolio Choice and Capital Markets , 1987 .
[3] Jerzy A. Filar,et al. Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..
[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[5] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .
[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[7] Daniel Hernández-Hernández,et al. Risk Sensitive Markov Decision Processes , 1997 .
[8] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[9] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[10] Shaun S. Wang. A CLASS OF DISTORTION OPERATORS FOR PRICING FINANCIAL AND INSURANCE RISKS , 2000 .
[11] Duan Li,et al. Optimal Dynamic Portfolio Selection: Multiperiod Mean‐Variance Formulation , 2000 .
[12] P. Tseng. Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .
[13] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[14] Vivek S. Borkar,et al. Q-Learning for Risk-Sensitive Control , 2002, Math. Oper. Res..
[15] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[16] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[17] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[18] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[20] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[21] Jack L. Treynor,et al. MUTUAL FUND PERFORMANCE* , 2007 .
[22] D. Parker. Managing risk in healthcare: understanding your safety culture using the Manchester Patient Safety Framework (MaPSaF). , 2009, Journal of nursing management.
[23] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[24] Ambuj Tewari,et al. On the Finite Time Convergence of Cyclic Coordinate Descent Methods , 2010, ArXiv.
[25] Haipeng Xing,et al. Mean--variance portfolio optimization when means and covariances are unknown , 2011, 1108.0996.
[26] John N. Tsitsiklis,et al. Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.
[27] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[28] Shalabh Bhatnagar,et al. Stochastic Recursive Algorithms for Optimization , 2012 .
[29] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[30] Mohammad Ghavamzadeh,et al. Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.
[31] Ambuj Tewari,et al. On the Nonasymptotic Convergence of Cyclic Coordinate Descent Methods , 2013, SIAM J. Optim..
[32] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[33] Sajal K. Das,et al. Beyond exponential utility functions: A variance-adjusted approach for risk-averse reinforcement learning , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[34] Mohammad Ghavamzadeh,et al. Algorithms for CVaR Optimization in MDPs , 2014, NIPS.
[35] Stephen J. Wright. Coordinate descent algorithms , 2015, Mathematical Programming.
[36] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.
[37] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[38] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[39] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[40] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[41] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[42] Hermann Winner,et al. Autonomous Driving: Technical, Legal and Social Aspects , 2016 .
[43] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[44] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[45] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[46] Phillipp Kaestner,et al. Linear And Nonlinear Programming , 2016 .
[47] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[48] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[49] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[50] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.
[51] Marco Pavone,et al. Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..
[52] Shie Mannor,et al. Consistent On-Line Off-Policy Evaluation , 2017, ICML.
[53] Marco Pavone,et al. How Should a Robot Assess Risk? Towards an Axiomatic Theory of Risk in Robotics , 2017, ISRR.
[54] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[55] Marcello Restelli,et al. Stochastic Variance-Reduced Policy Gradient , 2018, ICML.
[56] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[57] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[58] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[59] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[60] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[61] Bo Liu,et al. A Block Coordinate Ascent Algorithm for Mean-Variance Optimization , 2018, NeurIPS.
[62] Richard S. Sutton,et al. Multi-step Reinforcement Learning: A Unifying Algorithm , 2017, AAAI.
[63] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[64] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[65] Ilya Kostrikov,et al. AlgaeDICE: Policy Gradient from Arbitrary Experience , 2019, ArXiv.
[66] Marcello Restelli,et al. Risk-Averse Trust Region Optimization for Reward-Volatility Reduction , 2019, IJCAI.
[67] Qi Cai,et al. Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy , 2019, NeurIPS.
[68] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[69] Emma Brunskill,et al. Off-Policy Policy Gradient with State Distribution Correction , 2019, UAI 2019.
[70] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[71] Harm van Seijen,et al. Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning , 2019, NeurIPS.
[72] Marc G. Bellemare,et al. Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.
[73] Bo Dai,et al. GenDICE: Generalized Offline Estimation of Stationary Values , 2020, ICLR.
[74] S. Whiteson,et al. GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values , 2020, ICML.
[75] Hengshuai Yao,et al. Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation , 2019, ICML.
[76] Qiang Liu,et al. Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning , 2020, ICLR.