Safe Q-Learning Method Based on Constrained Markov Decision Processes
暂无分享,去创建一个
Fei Zhu | Xinghong Ling | Yangyang Ge | Quan Liu | Quan Liu | Fei Zhu | Xinghong Ling | Yangyang Ge
[1] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.
[2] H. Takata,et al. Nonlinear feedback control of stabilization problem via formal linearization using Taylor expansion , 2008, 2008 International Symposium on Information Theory and Its Applications.
[3] E. Altman. Constrained Markov Decision Processes , 1999 .
[4] Giuseppe Notarstefano,et al. Asynchronous Distributed Method of Multipliers for Constrained Nonconvex optimization , 2018, 2018 European Control Conference (ECC).
[5] Javad Mahmoudimehr,et al. A novel multi-objective Dynamic Programming optimization method: Performance management of a solar thermal power plant as a case study , 2019, Energy.
[6] Richard M. Golden,et al. Adaptive Learning Algorithm Convergence in Passive and Reactive Environments , 2018, Neural Computation.
[7] Victor C. M. Leung,et al. Deep-Reinforcement-Learning-Based Optimization for Cache-Enabled Opportunistic Interference Alignment Wireless Networks , 2017, IEEE Transactions on Vehicular Technology.
[8] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[9] Ufuk Topcu,et al. Constrained Cross-Entropy Method for Safe Reinforcement Learning , 2020, IEEE Transactions on Automatic Control.
[10] Vivek S. Borkar,et al. An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..
[11] Jerzy Martyna. Power Allocation in Cognitive Radio with Distributed Antenna System , 2017, NEW2AN.
[12] Jiafeng Guo,et al. Reinforcement Learning to Rank with Markov Decision Process , 2017, SIGIR.
[13] Yongqiang Li,et al. Data-driven approximate value iteration with optimality error bound analysis , 2017, Autom..
[14] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.
[15] Ana Busic,et al. Action-Constrained Markov Decision Processes With Kullback-Leibler Cost , 2018, COLT.
[16] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[17] Shie Mannor,et al. Scaling Up Robust MDPs by Reinforcement Learning , 2013, ArXiv.
[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[19] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[20] Mengmou Li,et al. Generalized Lagrange Multiplier Method and KKT Conditions With an Application to Distributed Optimization , 2019, IEEE Transactions on Circuits and Systems II: Express Briefs.
[21] Jonathan D. Cohen,et al. Toward a Rational and Mechanistic Account of Mental Effort. , 2017, Annual review of neuroscience.
[22] Tingwen Huang,et al. Model-Free Optimal Tracking Control via Critic-Only Q-Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.
[23] Saso Dzeroski,et al. Integrating Guidance into Relational Reinforcement Learning , 2004, Machine Learning.
[24] Pieter Abbeel,et al. Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..
[25] Behçet Açıkmeşe,et al. Controlled Markov Processes With Safety State Constraints , 2019, IEEE Transactions on Automatic Control.
[26] Ta-Wen Kuan,et al. VLSI Design of an SVM Learning Core on Sequential Minimal Optimization Algorithm , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[27] Paulo J. S. Silva,et al. Convergence Properties of a Second Order Augmented Lagrangian Method for Mathematical Programs with Complementarity Constraints , 2018, SIAM J. Optim..
[28] Vivek S. Borkar,et al. A Learning Algorithm for Risk-Sensitive Cost , 2008, Math. Oper. Res..
[29] Nguyen Dinh,et al. An approach to calmness of linear inequality systems from Farkas lemma , 2019, Optim. Lett..
[30] Ather Gattami,et al. Reinforcement Learning for Multi-Objective and Constrained Markov Decision Processes. , 2019, 1901.08978.
[31] Andreas Krause,et al. Safe controller optimization for quadrotors with Gaussian processes , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[32] Karina Valdivia Delgado,et al. Risk-Sensitive Markov Decision Process with Limited Budget , 2017, 2017 Brazilian Conference on Intelligent Systems (BRACIS).
[33] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[34] Sylvain Calinon,et al. A tutorial on task-parameterized movement learning and retrieval , 2015, Intelligent Service Robotics.
[35] Li Xia. Optimization of Markov decision processes under the variance criterion , 2016, Autom..
[36] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[37] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[38] Andreas Krause,et al. Safe Exploration in Finite Markov Decision Processes with Gaussian Processes , 2016, NIPS.
[39] Ioannis P. Vlahavas,et al. Learning to Teach Reinforcement Learning Agents , 2017, Mach. Learn. Knowl. Extr..
[40] Zengxin Wei,et al. On the Constant Positive Linear Dependence Condition and Its Application to SQP Methods , 1999, SIAM J. Optim..
[41] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[42] Peter Winkler,et al. The minimum Manhattan distance and minimum jump of permutations , 2019, J. Comb. Theory, Ser. A.
[43] Tomás Svoboda,et al. Safe Exploration Techniques for Reinforcement Learning - An Overview , 2014, MESAS.
[44] Doina Precup,et al. Smart exploration in reinforcement learning using absolute temporal difference errors , 2013, AAMAS.
[45] Etienne Perot,et al. Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.