Risk-Averse Offline Reinforcement Learning
暂无分享,去创建一个
[1] R. Rockafellar,et al. Conditional Value-at-Risk for General Loss Distributions , 2001 .
[2] M. Rabin. Risk Aversion and Expected Utility Theory: A Calibration Theorem , 2000 .
[3] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.
[4] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[5] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[6] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[7] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[8] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[9] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[10] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..
[11] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[12] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[13] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[14] Rahul Singh,et al. Improving Robustness via Risk Averse Distributional Reinforcement Learning , 2020, L4DC.
[15] Yang Gao,et al. Risk Averse Robust Adversarial Reinforcement Learning , 2019, 2019 International Conference on Robotics and Automation (ICRA).
[16] Andrzej Ruszczynski,et al. Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..
[17] S. Levine,et al. Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.
[18] John C. Duchi,et al. Variance-based Regularization with Convex Objectives , 2016, NIPS.
[19] Shaun S. Wang. Premium Calculation by Transforming the Layer Premium Density , 1996, ASTIN Bulletin.
[20] Martin A. Riedmiller,et al. Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning , 2020, ICLR.
[21] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.
[22] Sergey Levine,et al. D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.
[23] Marco Pavone,et al. Risk-Sensitive Generative Adversarial Imitation Learning , 2018, AISTATS.
[24] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .
[25] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[26] J. Pratt. RISK AVERSION IN THE SMALL AND IN THE LARGE11This research was supported by the National Science Foundation (grant NSF-G24035). Reproduction in whole or in part is permitted for any purpose of the United States Government. , 1964 .
[27] Natasha Jaques,et al. Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.
[28] Masashi Sugiyama,et al. Nonparametric Return Distribution Approximation for Reinforcement Learning , 2010, ICML.
[29] Alexander Shapiro,et al. Lectures on Stochastic Programming: Modeling and Theory , 2009 .
[30] Léon Bottou,et al. Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.
[31] Klaus Obermayer,et al. Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.
[32] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[33] Balaraman Ravindran,et al. RAIL: Risk-Averse Imitation Learning , 2018, AAMAS.
[34] Yair Carmon,et al. Large-Scale Methods for Distributionally Robust Optimization , 2020, NeurIPS.
[35] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[36] F. Delbaen. Coherent risk measures , 2000 .
[37] Mohammad Ghavamzadeh,et al. Algorithms for CVaR Optimization in MDPs , 2014, NIPS.
[38] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[39] Richard Gonzalez,et al. On the Shape of the Probability Weighting Function , 1999, Cognitive Psychology.
[40] Claude Sammut,et al. A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.
[41] A. Tversky,et al. Advances in prospect theory: Cumulative representation of uncertainty , 1992 .
[42] Sergey Levine,et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.
[43] Marek Petrik,et al. An Approximate Solution Method for Large Risk-Averse Markov Decision Processes , 2012, UAI.
[44] Stefanie Jegelka,et al. Adaptive Sampling for Stochastic Risk-Averse Learning , 2019, NeurIPS.
[45] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[46] Ruslan Salakhutdinov,et al. Worst Cases Policy Gradients , 2019, CoRL.
[47] Takayuki Osogami,et al. Robustness and risk-sensitivity in Markov decision processes , 2012, NIPS.
[48] Daniel Hernández-Hernández,et al. Risk Sensitive Markov Decision Processes , 1997 .
[49] Roman Vershynin,et al. High-Dimensional Probability , 2018 .
[50] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[51] Michael Figurnov,et al. Monte Carlo Gradient Estimation in Machine Learning , 2019, J. Mach. Learn. Res..
[52] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .
[53] Shie Mannor,et al. Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.
[54] D. Tasche,et al. On the coherence of expected shortfall , 2001, cond-mat/0104295.
[55] Michael C. Fu,et al. Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control , 2015, ICML.
[56] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.