暂无分享,去创建一个
[1] Alejandro Ribeiro,et al. Online Learning of Feasible Strategies in Unknown Environments , 2016, IEEE Transactions on Automatic Control.
[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[3] Marcus Hutter,et al. Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures , 2002, COLT.
[4] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.
[5] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[6] Steven I. Marcus,et al. Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes , 1999, Autom..
[7] H. Robbins. A Stochastic Approximation Method , 1951 .
[8] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.
[9] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[10] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .
[11] R. Durrett. Probability: Theory and Examples , 1993 .
[12] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[13] Jooyoung Park,et al. Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.
[14] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[15] Peter Geibel,et al. Reinforcement Learning for MDPs with Constraints , 2006, ECML.
[16] Ralph Neuneier,et al. Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.
[17] Alejandro Ribeiro,et al. Stochastic Policy Gradient Ascent in Reproducing Kernel Hilbert Spaces , 2018, IEEE Transactions on Automatic Control.
[18] Miklós Rásonyi,et al. ON UTILITY MAXIMIZATION IN DISCRETE-TIME FINANCIAL MARKET MODELS , 2005 .
[19] Dimitri P. Bertsekas,et al. Convex Optimization Algorithms , 2015 .
[20] Fritz Wysotzki,et al. Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..
[21] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[22] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[23] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[24] Masami Yasuda,et al. Discounted Markov decision processes with utility constraints , 2006, Comput. Math. Appl..
[25] Torsten Koller,et al. Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.
[26] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[27] Daniel Hernández-Hernández,et al. Risk Sensitive Markov Decision Processes , 1997 .
[28] Shalabh Bhatnagar,et al. An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes , 2012, J. Optim. Theory Appl..
[29] Yuval Tassa,et al. Safe Exploration in Continuous Action Spaces , 2018, ArXiv.
[30] Stefanie Jegelka,et al. ResNet with one-neuron hidden layers is a Universal Approximator , 2018, NeurIPS.
[31] Ken-ichi Funahashi,et al. On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.
[32] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[33] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[34] FernándezFernando,et al. A comprehensive survey on safe reinforcement learning , 2015 .
[35] Liwei Wang,et al. The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.
[36] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[37] Marco Pavone,et al. Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..
[38] Qing Ling,et al. An Online Convex Optimization Approach to Proactive Network Resource Allocation , 2017, IEEE Transactions on Signal Processing.
[39] David Q. Mayne,et al. Constrained model predictive control: Stability and optimality , 2000, Autom..
[40] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[41] D. Bertsekas,et al. Alternative theoretical frameworks for finite horizon discrete-time stochastic optimal control , 1977, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications.
[42] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[43] Klaus Obermayer,et al. Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.
[44] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[45] Shie Mannor,et al. Reward Constrained Policy Optimization , 2018, ICLR.
[46] Andreas Krause,et al. Safe Exploration in Finite Markov Decision Processes with Gaussian Processes , 2016, NIPS.
[47] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[48] Shie Mannor,et al. Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..
[49] Laurent Orseau,et al. AI Safety Gridworlds , 2017, ArXiv.
[50] Razvan Pascanu,et al. Ray Interference: a Source of Plateaus in Deep Reinforcement Learning , 2019, ArXiv.
[51] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[52] Axel van Lamsweerde,et al. Learning machine learning , 1991 .
[53] Kenji Fukumizu,et al. Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..
[54] Alejandro Ribeiro,et al. Learning Safe Policies via Primal-Dual Methods , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).