暂无分享,去创建一个
Alec Koppel | Mengdi Wang | Amrit Singh Bedi | Junyu Zhang | Mengdi Wang | Alec Koppel | A. S. Bedi | Junyu Zhang
[1] Shie Mannor,et al. Policy Gradient for Coherent Risk Measures , 2015, NIPS.
[2] Shie Mannor,et al. Nonlinear Distributional Gradient Temporal-Difference Learning , 2018, ICML.
[3] Tao Wang,et al. Stable Dual Dynamic Programming , 2007, NIPS.
[4] Francis Bach,et al. A Universal Algorithm for Variational Inequalities Adaptive to Smoothness and Noise , 2019, COLT.
[5] Tie-Yan Liu,et al. Fully Parameterized Quantile Function for Distributional Reinforcement Learning , 2019, NeurIPS.
[6] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[7] Andrzej Ruszczynski,et al. Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..
[8] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.
[9] Tomas Bjork,et al. A General Theory of Markovian Time Inconsistent Stochastic Control Problems , 2010 .
[10] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[11] R. Rockafellar,et al. Conditional Value-at-Risk for General Loss Distributions , 2001 .
[12] Lihong Li,et al. Scalable Bilinear π Learning Using State and Action Features , 2018, ICML 2018.
[13] William B. Haskell,et al. A Convex Analytic Approach to Risk-Aware Markov Decision Processes , 2015, SIAM J. Control. Optim..
[14] Bo Liu,et al. A Block Coordinate Ascent Algorithm for Mean-Variance Optimization , 2018, NeurIPS.
[15] A. S. Manne. Linear Programming and Sequential Decisions , 1960 .
[16] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[17] Mengdi Wang,et al. Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time , 2020, Math. Oper. Res..
[18] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[19] Alexandros Karatzoglou,et al. Learning to rank for recommender systems , 2013, RecSys.
[20] Mladen Kolar,et al. Convergent Policy Optimization for Safe Reinforcement Learning , 2019, NeurIPS.
[21] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[22] Josefa Mula,et al. Quantitative models for supply chain planning under uncertainty: a review , 2009 .
[23] André Roca,et al. Identifying the processes underpinning anticipation and decision-making in a dynamic time-constrained task , 2011, Cognitive Processing.
[24] Alec Koppel,et al. Beyond Cumulative Returns via Reinforcement Learning over State-Action Occupancy Measures , 2021, 2021 American Control Conference (ACC).
[25] Miguel Á. Carreira-Perpiñán,et al. Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application , 2013, ArXiv.
[26] Shie Mannor,et al. Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..
[27] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .
[28] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[29] Mohammad Ghavamzadeh,et al. Variance-constrained actor-critic algorithms for discounted and average reward MDPs , 2014, Machine Learning.
[30] Tomas Björk,et al. A theory of Markovian time-inconsistent stochastic control in discrete time , 2014, Finance Stochastics.
[31] Marco Pavone,et al. Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..
[32] Prashanth L.A,et al. Policy Gradients for CVaR-Constrained MDPs , 2014, 1405.2690.
[33] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[34] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[35] V. Krishnamurthy,et al. Implementation of gradient estimation to a constrained Markov decision problem , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).
[36] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[37] Alejandro Ribeiro,et al. Learning Safe Policies via Primal-Dual Methods , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).
[38] Warren B. Powell,et al. Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures , 2015, Math. Oper. Res..
[39] Alexander Shapiro,et al. Convex Approximations of Chance Constrained Programs , 2006, SIAM J. Optim..
[40] Doina Precup,et al. Exponentiated Gradient Methods for Reinforcement Learning , 1997, ICML.
[41] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[42] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .
[43] Sabrina M. Tom,et al. The Neural Basis of Loss Aversion in Decision-Making Under Risk , 2007, Science.
[44] C. Sims. Implications of rational inattention , 2003 .
[45] Mengdi Wang,et al. Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning , 2016, ArXiv.
[46] John N. Tsitsiklis,et al. Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.
[47] Mengdi Wang,et al. Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Running Time , 2017, 1704.01869.
[48] Mengdi Wang,et al. Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems , 2017, ArXiv.
[49] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[50] A. PrashanthL.. Policy Gradients for CVaR-Constrained MDPs , 2014, ALT.
[51] Andrzej Ruszczynski,et al. Risk-Averse Learning by Temporal Difference Methods , 2020, ArXiv.
[52] Christoph Dann,et al. Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy , 2020, AAAI.
[53] G. Hunanyan,et al. Portfolio Selection , 2019, Finanzwirtschaft, Banken und Bankmanagement I Finance, Banks and Bank Management.