A comprehensive survey on safe reinforcement learning
暂无分享,去创建一个
[1] John McCarthy,et al. Programs with common sense , 1960 .
[2] J. Cockcroft. Investment in Science , 1962, Nature.
[3] R. Howard,et al. Risk-Sensitive Markov Decision Processes , 1972 .
[4] Philip Klahr,et al. Advice-Taking and Knowledge Refinement: An Iterative View of Skill Acquisition , 1980 .
[5] M. J. Sobel,et al. Discounted MDP's: distribution functions and exponential utility maximization , 1987 .
[6] C. Watkins. Learning from delayed rewards , 1989 .
[7] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.
[8] Paul E. Utgoff,et al. Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.
[9] Paul E. Utgoff,et al. A Teaching Method for Reinforcement Learning , 1992, ML.
[10] Eitan Altman,et al. Asymptotic properties of constrained Markov Decision Processes , 1993, ZOR Methods Model. Oper. Res..
[11] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[12] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[13] Paul E. Utgoff,et al. On integrating apprentice learning and reinforcement learning , 1996 .
[14] J. Doyle,et al. Robust and optimal control , 1995, Proceedings of 35th IEEE Conference on Decision and Control.
[15] S. Marcus,et al. Mixed Risk-Neutral/Minimax Control of Markov Decision Processes , 1997 .
[16] J. Clouse. On integrating apprentice learning and reinforcement learning TITLE2 , 1997 .
[17] Gerald Sommer,et al. Learning by biasing , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).
[18] G. Cybenko,et al. Minimax-based reinforcement learning with state aggregation , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[19] Senén Barro,et al. Supervised Reinforcement Learning: Application to a Wall Following Behaviour in a Mobile Robot , 1998, IEA/AIE.
[20] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[21] Steven I. Marcus,et al. Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes , 1999, Autom..
[22] Helmut Mausser,et al. Beyond VaR: from measuring risk to managing risk , 1999, Proceedings of the IEEE/IAFE 1999 Conference on Computational Intelligence for Financial Engineering (CIFEr) (IEEE Cat. No.99TH8408).
[23] Steven I. Marcus,et al. Mixed risk-neutral/minimax control of discrete-time, finite-state Markov decision processes , 2000, IEEE Trans. Autom. Control..
[24] Leslie Pack Kaelbling,et al. Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.
[25] Vivek S. Borkar,et al. A sensitivity formula for risk-sensitive cost and the actor-critic algorithm , 2001, Syst. Control. Lett..
[26] Makoto Sato,et al. TD algorithm for the variance of return and mean-variance reinforcement learning , 2001 .
[27] Stephen D. Patek,et al. On terminating Markov decision processes with a risk-averse objective function , 2001, Autom..
[28] M. Rosenstein,et al. Supervised Learning Combined with an Actor-Critic Architecture TITLE2: , 2002 .
[29] Vivek S. Borkar,et al. Q-Learning for Risk-Sensitive Control , 2002, Math. Oper. Res..
[30] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[31] Marcus Hutter,et al. Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures , 2002, COLT.
[32] Pedro Campos,et al. Abalearn: Efficient Self-Play Learning of the game Abalone , 2003 .
[33] Sven Koenig,et al. Risk-averse auction agents , 2003, AAMAS '03.
[34] Suman Chakravorty,et al. Minimax Reinforcement Learning , 2003 .
[35] Chris Gaskett,et al. Reinforcement learning under circumstances beyond its control , 2003 .
[36] Carlos V. Regueiro,et al. Using Prior Knowledge to Improve Reinforcement Learning in Mobile Robotics , 2004 .
[37] Ralph Neuneier,et al. Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.
[38] Saso Dzeroski,et al. Integrating Guidance into Relational Reinforcement Learning , 2004, Machine Learning.
[39] A. Moore,et al. Learning decisions: robustness, uncertainty, and approximation , 2004 .
[40] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[41] Gregory Kuhlmann and Peter Stone and Raymond J. Mooney and Shavlik. Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer , 2004, AAAI 2004.
[42] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[43] Richard Maclin,et al. Knowledge-Based Support-Vector Regression for Reinforcement Learning , 2005 .
[44] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.
[45] Fritz Wysotzki,et al. Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..
[46] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[47] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[48] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..
[49] Jude W. Shavlik,et al. Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another , 2005, ECML.
[50] Frederic Maire,et al. Apprenticeship Learning for Initial Value Functions in Reinforcement Learning , 2005, IJCAI 2005.
[51] Giorgio Szegö,et al. Measures of risk , 2002, Eur. J. Oper. Res..
[52] Jude W. Shavlik,et al. Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.
[53] Jude W. Shavlik,et al. Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression , 2005, AAAI.
[54] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.
[55] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[56] Peter Geibel,et al. Reinforcement Learning for MDPs with Constraints , 2006, ECML.
[57] Manuela M. Veloso,et al. Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.
[58] Masami Yasuda,et al. Discounted Markov decision processes with utility constraints , 2006, Comput. Math. Appl..
[59] Gerald Sommer,et al. Evolutionary reinforcement learning of artificial neural networks , 2007, Int. J. Hybrid Intell. Syst..
[60] Peter Stone,et al. Representation Transfer for Reinforcement Learning , 2007, AAAI Fall Symposium: Computational Approaches to Representation Change during Learning and Development.
[61] Hamdy A. Taha,et al. Operations research: an introduction / Hamdy A. Taha , 1982 .
[62] Changming Yin,et al. Risk-sensitive reinforcement learning algorithms with generalized average criterion , 2007 .
[63] Peter Stone,et al. Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..
[64] Brahim Chaib-draa,et al. Reducing the complexity of multiagent reinforcement learning , 2007, AAMAS '07.
[65] Hisashi Kashima. Risk-Sensitive Learning via Minimization of Empirical Conditional Value-at-Risk , 2007, IEICE Trans. Inf. Syst..
[66] Pieter Abbeel,et al. Autonomous Autorotation of an RC Helicopter , 2008, ISER.
[67] Pieter Abbeel,et al. Apprenticeship learning and reinforcement learning with application to robotic control , 2008 .
[68] Marcus Hutter,et al. On the Possibility of Learning in Reactive Environments with Arbitrary Dependence , 2008, Theor. Comput. Sci..
[69] Steffen Udluft,et al. Safe exploration for reinforcement learning , 2008, ESANN.
[70] Victor Uc Cetina. Autonomous agent learning using an actor-critic algorithm and behavior models , 2008, AAMAS.
[71] Carlos A. Coello Coello,et al. Seeding the initial population of a multi-objective evolutionary algorithm using gradient-based information , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[72] Vivek S. Borkar,et al. A Learning Algorithm for Risk-Sensitive Cost , 2008, Math. Oper. Res..
[73] Andrea Lockerd Thomaz,et al. Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..
[74] Michael L. Littman,et al. Multi-resolution Exploration in Continuous Spaces , 2008, NIPS.
[75] Peter Stone,et al. Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.
[76] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..
[77] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[78] Abhijit Gosavi. Reinforcement learning for model building and variance-penalized control , 2009, Proceedings of the 2009 Winter Simulation Conference (WSC).
[79] Shimon Whiteson,et al. Neuroevolutionary reinforcement learning for generalized helicopter control , 2009, GECCO.
[80] Pierre-Yves Oudeyer,et al. R-IAC: Robust Intrinsically Motivated Exploration and Active Learning , 2009, IEEE Transactions on Autonomous Mental Development.
[81] Javier de Lope,et al. Learning Autonomous Helicopter Flight with Evolutionary Reinforcement Learning , 2009 .
[82] Manuela M. Veloso,et al. Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..
[83] Pierre-Yves Oudeyer,et al. Robust intrinsically motivated exploration and active learning , 2009, 2009 IEEE 8th International Conference on Development and Learning.
[84] Naoki Abe,et al. Optimizing debt collections using constrained reinforcement learning , 2010, KDD.
[85] Masashi Sugiyama,et al. Nonparametric Return Distribution Approximation for Reinforcement Learning , 2010, ICML.
[86] Pieter Abbeel,et al. Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..
[87] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.
[88] Thomas G. Dietterich,et al. Reinforcement Learning Via Practice and Critique Advice , 2010, AAAI.
[89] Peter Stone,et al. Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.
[90] Pieter Abbeel,et al. Parameterized maneuver learning for autonomous helicopter flight , 2010, 2010 IEEE International Conference on Robotics and Automation.
[91] Shie Mannor,et al. Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..
[92] Javier García,et al. Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..
[93] Francisco Javier García-Polo,et al. Safe reinforcement learning in high-risk tasks through policy improvement , 2011, ADPRL.
[94] Shimon Whiteson,et al. Neuroevolutionary reinforcement learning for generalized control of simulated helicopters , 2011, Evol. Intell..
[95] Alborz Geramifard,et al. UAV cooperative control with stochastic risk models , 2011, Proceedings of the 2011 American Control Conference.
[96] John N. Tsitsiklis,et al. Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.
[97] Clayton T. Morrison,et al. Blending Autonomous Exploration and Apprenticeship Learning , 2011, NIPS.
[98] Lisa A. Torrey. Help an Agent Out : Student / Teacher Learning in Sequential Decision Tasks , 2011 .
[99] Matthew E. Taylor,et al. Understanding Human Teaching Modalities in Reinforcement Learning Environments: A Preliminary Report , 2011 .
[100] Sonia Chernova,et al. Effect of human guidance and state space size on Interactive Reinforcement Learning , 2011, 2011 RO-MAN.
[101] Michael L. Littman,et al. Efficient model-based exploration in continuous state-space environments , 2011 .
[102] Pradyot V. N. Korupolu,et al. Beyond Rewards : Learning from Richer Supervision , 2011 .
[103] Peter Stone,et al. TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.
[104] Javier García,et al. Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..
[105] Alborz Geramifard,et al. Practical reinforcement learning using representation learning and safe exploration for large scale Markov decision processes , 2012 .
[106] Matthew E. Taylor,et al. Towards student/teacher learning in sequential decision tasks , 2012, AAMAS.
[107] Takayuki Osogami,et al. Robustness and risk-sensitivity in Markov decision processes , 2012, NIPS.
[108] Yibin Li,et al. An efficient initialization approach of Q-learning for mobile robots , 2012 .
[109] Michael T. Rosenstein,et al. Supervised Actor‐Critic Reinforcement Learning , 2012 .
[110] Pieter Abbeel,et al. Risk Aversion in Markov Decision Processes via Near Optimal Chernoff Bounds , 2012, NIPS.
[111] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.
[112] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[113] Sameera S. Ponda,et al. Risk allocation strategies for distributed chance-constrained task allocation , 2013, 2013 American Control Conference.
[114] Carlos V. Regueiro,et al. Learning on real robots from experience and simple user feedback , 2013 .
[115] Shie Mannor,et al. Scaling Up Robust MDPs by Reinforcement Learning , 2013, ArXiv.
[116] Alborz Geramifard,et al. Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning , 2013, J. Intell. Robotic Syst..
[117] Doina Precup,et al. Smart exploration in reinforcement learning using absolute temporal difference errors , 2013, AAMAS.
[118] Mi-Ching Tsai,et al. Robust and Optimal Control , 2014 .
[119] Peter Kulchyski. and , 2015 .