Risk-Averse Offline Reinforcement Learning

[1]  R. Rockafellar,et al.  Conditional Value-at-Risk for General Loss Distributions , 2001 .

[2]  M. Rabin Risk Aversion and Expected Utility Theory: A Calibration Theorem , 2000 .

[3]  Shie Mannor,et al.  Optimizing the CVaR via Sampling , 2014, AAAI.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Martin A. Riedmiller,et al.  Batch Reinforcement Learning , 2012, Reinforcement Learning.

[6]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Csaba Szepesvári,et al.  Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..

[9]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[10]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[11]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[12]  Shie Mannor,et al.  Policy Gradients with Variance Related Risk Criteria , 2012, ICML.

[13]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[14]  Rahul Singh,et al.  Improving Robustness via Risk Averse Distributional Reinforcement Learning , 2020, L4DC.

[15]  Yang Gao,et al.  Risk Averse Robust Adversarial Reinforcement Learning , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[16]  Andrzej Ruszczynski,et al.  Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[17]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[18]  John C. Duchi,et al.  Variance-based Regularization with Convex Objectives , 2016, NIPS.

[19]  Shaun S. Wang Premium Calculation by Transforming the Layer Premium Density , 1996, ASTIN Bulletin.

[20]  Martin A. Riedmiller,et al.  Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.

[21]  Gabriel Dulac-Arnold,et al.  Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[22]  Sergey Levine,et al.  D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[23]  Marco Pavone,et al.  Risk-Sensitive Generative Adversarial Imitation Learning , 2018, AISTATS.

[24]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[25]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[26]  J. Pratt RISK AVERSION IN THE SMALL AND IN THE LARGE11This research was supported by the National Science Foundation (grant NSF-G24035). Reproduction in whole or in part is permitted for any purpose of the United States Government. , 1964 .

[27]  Natasha Jaques,et al.  Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.

[28]  Masashi Sugiyama,et al.  Nonparametric Return Distribution Approximation for Reinforcement Learning , 2010, ICML.

[29]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[30]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[31]  Klaus Obermayer,et al.  Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.

[32]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[33]  Balaraman Ravindran,et al.  RAIL: Risk-Averse Imitation Learning , 2018, AAMAS.

[34]  Yair Carmon,et al.  Large-Scale Methods for Distributionally Robust Optimization , 2020, NeurIPS.

[35]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[36]  F. Delbaen Coherent risk measures , 2000 .

[37]  Mohammad Ghavamzadeh,et al.  Algorithms for CVaR Optimization in MDPs , 2014, NIPS.

[38]  Rémi Munos,et al.  Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[39]  Richard Gonzalez,et al.  On the Shape of the Probability Weighting Function , 1999, Cognitive Psychology.

[40]  Claude Sammut,et al.  A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[41]  A. Tversky,et al.  Advances in prospect theory: Cumulative representation of uncertainty , 1992 .

[42]  Sergey Levine,et al.  Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.

[43]  Marek Petrik,et al.  An Approximate Solution Method for Large Risk-Averse Markov Decision Processes , 2012, UAI.

[44]  Stefanie Jegelka,et al.  Adaptive Sampling for Stochastic Risk-Averse Learning , 2019, NeurIPS.

[45]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[46]  Ruslan Salakhutdinov,et al.  Worst Cases Policy Gradients , 2019, CoRL.

[47]  Takayuki Osogami,et al.  Robustness and risk-sensitivity in Markov decision processes , 2012, NIPS.

[48]  Daniel Hernández-Hernández,et al.  Risk Sensitive Markov Decision Processes , 1997 .

[49]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[50]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[51]  Michael Figurnov,et al.  Monte Carlo Gradient Estimation in Machine Learning , 2019, J. Mach. Learn. Res..

[52]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[53]  Shie Mannor,et al.  Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.

[54]  D. Tasche,et al.  On the coherence of expected shortfall , 2001, cond-mat/0104295.

[55]  Michael C. Fu,et al.  Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control , 2015, ICML.

[56]  Lantao Yu,et al.  MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.