论文信息 - Safe Reinforcement Learning - 字舞流文

Safe Reinforcement Learning

SAFE REINFORCEMENT LEARNING

Philip S. Thomas | Philip Thomas

[1] J. Kiefer,et al. Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator , 1956 .

[2] Norbert Wiener,et al. God and Golem, inc. : a comment on certain points where cybernetics impinges on religion , 1964 .

[3] M. J. D. Powell,et al. Weighted Uniform Sampling — a Monte Carlo Technique for Reducing Variance , 1966 .

[4] T. W. Anderson. CONFIDENCE LIMITS FOR THE EXPECTED VALUE OF AN ARBITRARY BOUNDED RANDOM VARIABLE WITH A CONTINUOUS DISTRIBUTION FUNCTION , 1969 .

[5] D. Bertsekas,et al. Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[6] J. Hammersley. SIMULATION AND THE MONTE CARLO METHOD , 1982 .

[7] B. Efron. Better Bootstrap Confidence Intervals , 1987 .

[8] Robert Tibshirani,et al. An Introduction to the Bootstrap , 1994 .

[9] Pranab Kumar Sen,et al. Large Sample Methods in Statistics: An Introduction with Applications , 1993 .

[10] Karl Johan Åström,et al. PID Controllers: Theory, Design, and Tuning , 1995 .

[11] Y. Benjamini,et al. Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[12] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[13] J. Doyle,et al. Robust and optimal control , 1995, Proceedings of 35th IEEE Conference on Decision and Control.

[14] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .

[15] R. Wilcox. Introduction to Robust Estimation and Hypothesis Testing , 1997 .

[16] Anthony C. Davison,et al. Bootstrap Methods and Their Application , 1998 .

[17] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[18] J Carpenter,et al. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. , 2000, Statistics in medicine.

[19] Douglas C. Hittle,et al. Robust reinforcement learning control with static and dynamic stability , 2001 .

[20] Michail G. Lagoudakis,et al. Model-Free Least-Squares Policy Iteration , 2001, NIPS.

[21] Andrew G. Barto,et al. Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..

[22] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[23] Aidan O'Dwyer,et al. Handbook of PI and PID controller tuning rules , 2003 .

[24] J. Pankow,et al. Prediction of coronary heart disease in middle-aged adults with diabetes. , 2003, Diabetes care.

[25] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .

[26] Neil Munro,et al. Fast calculation of stabilizing PID controllers , 2003, Autom..

[27] H. Keselman,et al. Modern robust data analysis methods: measures of central tendency. , 2003, Psychological methods.

[28] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[29] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[30] Mame Astou Diouf,et al. Improved Nonparametric Inference for the Mean of a Bounded Random Variable with Application to Poverty Measures , 2005 .

[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[32] Devinder Thapa,et al. Agent Based Decision Support System Using Reinforcement Learning Under Emergency Circumstances , 2005, ICNC.

[33] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.

[34] R. H. Myers,et al. STAT 319 : Probability & Statistics for Engineers & Scientists Term 152 ( 1 ) Final Exam Wednesday 11 / 05 / 2016 8 : 00 – 10 : 30 AM , 2016 .

[35] Nikolaus Hansen,et al. The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[36] Joel R. Tetreault,et al. Comparing the Utility of State Features in Spoken Dialogue Using Reinforcement Learning , 2006, NAACL.

[37] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..

[38] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .

[39] P. Massart,et al. Concentration inequalities and model selection , 2007 .

[40] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[41] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.

[42] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[43] Kathleen M. Jagodnik,et al. Creating a Reinforcement Learning Controller for Functional Electrical Stimulation of a Human Arm. , 2008, The ... Yale Workshop on Adaptive and Learning Systems.

[44] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..

[45] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..

[46] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[47] Richard L. Lewis,et al. Where Do Rewards Come From , 2009 .

[48] Alin Albu-Schäffer,et al. Towards the Robotic Co-Worker , 2009, ISRR.

[49] Lee Spector,et al. Genetic Programming for Reward Function Search , 2010, IEEE Transactions on Autonomous Mental Development.

[50] Larry D. Pyeatt,et al. Reinforcement Learning for Closed-Loop Propofol Anesthesia: A Human Volunteer Study , 2010, IAAI.

[51] Jiming Jiang. Large Sample Techniques for Statistics , 2010, Springer Texts in Statistics.

[52] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.

[53] Michael Athans,et al. Linear Quadratic Regulator Control , 2010, The Control Systems Handbook.

[54] Richard L. Lewis,et al. Internal Rewards Mitigate Agent Boundedness , 2010, ICML.

[55] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .

[56] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.

[57] Petros A. Ioannou,et al. Robust Adaptive Control , 2012 .

[58] Scott Kuindersma,et al. Variable risk control via stochastic optimization , 2013, Int. J. Robotics Res..

[59] Thomas G. Dietterich,et al. Allowing a wildfire to burn: estimating the effect on future fire suppression costs , 2013 .

[60] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[61] David Silver,et al. Concurrent Reinforcement Learning from Customer Interactions , 2013, ICML.

[62] Nicolò Cesa-Bianchi,et al. Bandits With Heavy Tail , 2012, IEEE Transactions on Information Theory.

[63] Ragunathan Rajkumar,et al. Parallel scheduling for cyber-physical systems: Analysis and case study on a self-driving car , 2013, 2013 ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS).

[64] Sridhar Mahadevan,et al. Projected Natural Actor-Critic , 2013, NIPS.

[65] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.

[66] Jaime F. Fisac,et al. Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[67] Nick Bostrom,et al. Superintelligence: Paths, Dangers, Strategies , 2014 .

[68] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.

[69] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[70] Philip S. Thomas,et al. Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees , 2015, IJCAI.

[71] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.

[72] Scott Niekum,et al. Policy Evaluation Using the Ω-Return , 2015, NIPS.

[73] Emma Brunskill,et al. Concurrent PAC RL , 2015, AAAI.

[74] Shie Mannor,et al. Off-policy Model-based Learning under Unknown Factored Dynamics , 2015, ICML.

[75] Philip S. Thomas,et al. Ad Recommendation Systems for Life-Time Value Optimization , 2015, WWW.

[76] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.

[77] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.

[78] P. Massart. DVORETZKY-KIEFER-WOLFOWITZ INEQUALITY , 2016 .