论文信息 - Risk-Averse Offline Reinforcement Learning - 字舞流文

Risk-Averse Offline Reinforcement Learning

A. Krause | Sebastian Curi | N'uria Armengol Urp'i

[1] R. Rockafellar,et al. Conditional Value-at-Risk for General Loss Distributions , 2001 .

[2] M. Rabin. Risk Aversion and Expected Utility Theory: A Calibration Theorem , 2000 .

[3] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.

[4] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.

[6] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[8] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..

[9] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[10] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..

[11] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[12] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.

[13] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[14] Rahul Singh,et al. Improving Robustness via Risk Averse Distributional Reinforcement Learning , 2020, L4DC.

[15] Yang Gao,et al. Risk Averse Robust Adversarial Reinforcement Learning , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[16] Andrzej Ruszczynski,et al. Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[17] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[18] John C. Duchi,et al. Variance-based Regularization with Convex Objectives , 2016, NIPS.

[19] Shaun S. Wang. Premium Calculation by Transforming the Layer Premium Density , 1996, ASTIN Bulletin.

[20] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.

[21] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[22] Sergey Levine,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[23] Marco Pavone,et al. Risk-Sensitive Generative Adversarial Imitation Learning , 2018, AISTATS.

[24] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[25] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[26] J. Pratt. RISK AVERSION IN THE SMALL AND IN THE LARGE11This research was supported by the National Science Foundation (grant NSF-G24035). Reproduction in whole or in part is permitted for any purpose of the United States Government. , 1964 .

[27] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.

[28] Masashi Sugiyama,et al. Nonparametric Return Distribution Approximation for Reinforcement Learning , 2010, ICML.

[29] Alexander Shapiro,et al. Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[30] Léon Bottou,et al. Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[31] Klaus Obermayer,et al. Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.

[32] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[33] Balaraman Ravindran,et al. RAIL: Risk-Averse Imitation Learning , 2018, AAMAS.

[34] Yair Carmon,et al. Large-Scale Methods for Distributionally Robust Optimization , 2020, NeurIPS.

[35] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[36] F. Delbaen. Coherent risk measures , 2000 .

[37] Mohammad Ghavamzadeh,et al. Algorithms for CVaR Optimization in MDPs , 2014, NIPS.

[38] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[39] Richard Gonzalez,et al. On the Shape of the Probability Weighting Function , 1999, Cognitive Psychology.

[40] Claude Sammut,et al. A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[41] A. Tversky,et al. Advances in prospect theory: Cumulative representation of uncertainty , 1992 .

[42] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.

[43] Marek Petrik,et al. An Approximate Solution Method for Large Risk-Averse Markov Decision Processes , 2012, UAI.

[44] Stefanie Jegelka,et al. Adaptive Sampling for Stochastic Risk-Averse Learning , 2019, NeurIPS.

[45] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[46] Ruslan Salakhutdinov,et al. Worst Cases Policy Gradients , 2019, CoRL.

[47] Takayuki Osogami,et al. Robustness and risk-sensitivity in Markov decision processes , 2012, NIPS.

[48] Daniel Hernández-Hernández,et al. Risk Sensitive Markov Decision Processes , 1997 .

[49] Roman Vershynin,et al. High-Dimensional Probability , 2018 .

[50] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[51] Michael Figurnov,et al. Monte Carlo Gradient Estimation in Machine Learning , 2019, J. Mach. Learn. Res..

[52] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .

[53] Shie Mannor,et al. Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.

[54] D. Tasche,et al. On the coherence of expected shortfall , 2001, cond-mat/0104295.

[55] Michael C. Fu,et al. Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control , 2015, ICML.

[56] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.