Safe Reinforcement Learning

SAFE REINFORCEMENT LEARNING

[1]  J. Kiefer,et al.  Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator , 1956 .

[2]  Norbert Wiener,et al.  God and Golem, inc. : a comment on certain points where cybernetics impinges on religion , 1964 .

[3]  M. J. D. Powell,et al.  Weighted Uniform Sampling — a Monte Carlo Technique for Reducing Variance , 1966 .

[4]  T. W. Anderson CONFIDENCE LIMITS FOR THE EXPECTED VALUE OF AN ARBITRARY BOUNDED RANDOM VARIABLE WITH A CONTINUOUS DISTRIBUTION FUNCTION , 1969 .

[5]  D. Bertsekas,et al.  Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  J. Hammersley SIMULATION AND THE MONTE CARLO METHOD , 1982 .

[7]  B. Efron Better Bootstrap Confidence Intervals , 1987 .

[8]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[9]  Pranab Kumar Sen,et al.  Large Sample Methods in Statistics: An Introduction with Applications , 1993 .

[10]  Karl Johan Åström,et al.  PID Controllers: Theory, Design, and Tuning , 1995 .

[11]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[12]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[13]  J. Doyle,et al.  Robust and optimal control , 1995, Proceedings of 35th IEEE Conference on Decision and Control.

[14]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[15]  R. Wilcox Introduction to Robust Estimation and Hypothesis Testing , 1997 .

[16]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[17]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[18]  J Carpenter,et al.  Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. , 2000, Statistics in medicine.

[19]  Douglas C. Hittle,et al.  Robust reinforcement learning control with static and dynamic stability , 2001 .

[20]  Michail G. Lagoudakis,et al.  Model-Free Least-Squares Policy Iteration , 2001, NIPS.

[21]  Andrew G. Barto,et al.  Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..

[22]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[23]  Aidan O'Dwyer,et al.  Handbook of PI and PID controller tuning rules , 2003 .

[24]  J. Pankow,et al.  Prediction of coronary heart disease in middle-aged adults with diabetes. , 2003, Diabetes care.

[25]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[26]  Neil Munro,et al.  Fast calculation of stabilizing PID controllers , 2003, Autom..

[27]  H. Keselman,et al.  Modern robust data analysis methods: measures of central tendency. , 2003, Psychological methods.

[28]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[29]  Michael L. Littman,et al.  A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[30]  Mame Astou Diouf,et al.  Improved Nonparametric Inference for the Mean of a Bounded Random Variable with Application to Poverty Measures , 2005 .

[31]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[32]  Devinder Thapa,et al.  Agent Based Decision Support System Using Reinforcement Learning Under Emergency Circumstances , 2005, ICNC.

[33]  Lihong Li,et al.  PAC model-free reinforcement learning , 2006, ICML.

[34]  R. H. Myers,et al.  STAT 319 : Probability & Statistics for Engineers & Scientists Term 152 ( 1 ) Final Exam Wednesday 11 / 05 / 2016 8 : 00 – 10 : 30 AM , 2016 .

[35]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[36]  Joel R. Tetreault,et al.  Comparing the Utility of State Features in Spoken Dialogue Using Reinforcement Learning , 2006, NAACL.

[37]  John N. Tsitsiklis,et al.  Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..

[38]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[39]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[40]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[41]  R. Sutton,et al.  A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.

[42]  Michael L. Littman,et al.  An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[43]  Kathleen M. Jagodnik,et al.  Creating a Reinforcement Learning Controller for Functional Electrical Stimulation of a Human Arm. , 2008, The ... Yale Workshop on Adaptive and Learning Systems.

[44]  Shalabh Bhatnagar,et al.  Natural actor-critic algorithms , 2009, Autom..

[45]  Lihong Li,et al.  Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..

[46]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[47]  Richard L. Lewis,et al.  Where Do Rewards Come From , 2009 .

[48]  Alin Albu-Schäffer,et al.  Towards the Robotic Co-Worker , 2009, ISRR.

[49]  Lee Spector,et al.  Genetic Programming for Reward Function Search , 2010, IEEE Transactions on Autonomous Mental Development.

[50]  Larry D. Pyeatt,et al.  Reinforcement Learning for Closed-Loop Propofol Anesthesia: A Human Volunteer Study , 2010, IAAI.

[51]  Jiming Jiang Large Sample Techniques for Statistics , 2010, Springer Texts in Statistics.

[52]  Csaba Szepesvári,et al.  Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.

[53]  Michael Athans,et al.  Linear Quadratic Regulator Control , 2010, The Control Systems Handbook.

[54]  Richard L. Lewis,et al.  Internal Rewards Mitigate Agent Boundedness , 2010, ICML.

[55]  D. Bertsekas Approximate policy iteration: a survey and some new methods , 2011 .

[56]  Tor Lattimore,et al.  PAC Bounds for Discounted MDPs , 2012, ALT.

[57]  Petros A. Ioannou,et al.  Robust Adaptive Control , 2012 .

[58]  Scott Kuindersma,et al.  Variable risk control via stochastic optimization , 2013, Int. J. Robotics Res..

[59]  Thomas G. Dietterich,et al.  Allowing a wildfire to burn: estimating the effect on future fire suppression costs , 2013 .

[60]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..

[61]  David Silver,et al.  Concurrent Reinforcement Learning from Customer Interactions , 2013, ICML.

[62]  Nicolò Cesa-Bianchi,et al.  Bandits With Heavy Tail , 2012, IEEE Transactions on Information Theory.

[63]  Ragunathan Rajkumar,et al.  Parallel scheduling for cyber-physical systems: Analysis and case study on a self-driving car , 2013, 2013 ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS).

[64]  Sridhar Mahadevan,et al.  Projected Natural Actor-Critic , 2013, NIPS.

[65]  Daniele Calandriello,et al.  Safe Policy Iteration , 2013, ICML.

[66]  Jaime F. Fisac,et al.  Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[67]  Nick Bostrom,et al.  Superintelligence: Paths, Dangers, Strategies , 2014 .

[68]  Richard S. Sutton,et al.  Off-policy TD( l) with a true online equivalence , 2014, UAI.

[69]  Lihong Li,et al.  PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[70]  Philip S. Thomas,et al.  Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees , 2015, IJCAI.

[71]  Philip S. Thomas,et al.  High Confidence Policy Improvement , 2015, ICML.

[72]  Scott Niekum,et al.  Policy Evaluation Using the Ω-Return , 2015, NIPS.

[73]  Emma Brunskill,et al.  Concurrent PAC RL , 2015, AAAI.

[74]  Shie Mannor,et al.  Off-policy Model-based Learning under Unknown Factored Dynamics , 2015, ICML.

[75]  Philip S. Thomas,et al.  Ad Recommendation Systems for Life-Time Value Optimization , 2015, WWW.

[76]  Philip S. Thomas,et al.  High-Confidence Off-Policy Evaluation , 2015, AAAI.

[77]  Marek Petrik,et al.  Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.

[78]  P. Massart DVORETZKY-KIEFER-WOLFOWITZ INEQUALITY , 2016 .