Safe Reinforcement Learning
暂无分享,去创建一个
[1] J. Kiefer,et al. Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator , 1956 .
[2] Norbert Wiener,et al. God and Golem, inc. : a comment on certain points where cybernetics impinges on religion , 1964 .
[3] M. J. D. Powell,et al. Weighted Uniform Sampling — a Monte Carlo Technique for Reducing Variance , 1966 .
[4] T. W. Anderson. CONFIDENCE LIMITS FOR THE EXPECTED VALUE OF AN ARBITRARY BOUNDED RANDOM VARIABLE WITH A CONTINUOUS DISTRIBUTION FUNCTION , 1969 .
[5] D. Bertsekas,et al. Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.
[6] J. Hammersley. SIMULATION AND THE MONTE CARLO METHOD , 1982 .
[7] B. Efron. Better Bootstrap Confidence Intervals , 1987 .
[8] Robert Tibshirani,et al. An Introduction to the Bootstrap , 1994 .
[9] Pranab Kumar Sen,et al. Large Sample Methods in Statistics: An Introduction with Applications , 1993 .
[10] Karl Johan Åström,et al. PID Controllers: Theory, Design, and Tuning , 1995 .
[11] Y. Benjamini,et al. Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .
[12] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[13] J. Doyle,et al. Robust and optimal control , 1995, Proceedings of 35th IEEE Conference on Decision and Control.
[14] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .
[15] R. Wilcox. Introduction to Robust Estimation and Hypothesis Testing , 1997 .
[16] Anthony C. Davison,et al. Bootstrap Methods and Their Application , 1998 .
[17] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[18] J Carpenter,et al. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. , 2000, Statistics in medicine.
[19] Douglas C. Hittle,et al. Robust reinforcement learning control with static and dynamic stability , 2001 .
[20] Michail G. Lagoudakis,et al. Model-Free Least-Squares Policy Iteration , 2001, NIPS.
[21] Andrew G. Barto,et al. Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..
[22] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[23] Aidan O'Dwyer,et al. Handbook of PI and PID controller tuning rules , 2003 .
[24] J. Pankow,et al. Prediction of coronary heart disease in middle-aged adults with diabetes. , 2003, Diabetes care.
[25] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[26] Neil Munro,et al. Fast calculation of stabilizing PID controllers , 2003, Autom..
[27] H. Keselman,et al. Modern robust data analysis methods: measures of central tendency. , 2003, Psychological methods.
[28] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[29] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[30] Mame Astou Diouf,et al. Improved Nonparametric Inference for the Mean of a Bounded Random Variable with Application to Poverty Measures , 2005 .
[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[32] Devinder Thapa,et al. Agent Based Decision Support System Using Reinforcement Learning Under Emergency Circumstances , 2005, ICNC.
[33] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[34] R. H. Myers,et al. STAT 319 : Probability & Statistics for Engineers & Scientists Term 152 ( 1 ) Final Exam Wednesday 11 / 05 / 2016 8 : 00 – 10 : 30 AM , 2016 .
[35] Nikolaus Hansen,et al. The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.
[36] Joel R. Tetreault,et al. Comparing the Utility of State Features in Spoken Dialogue Using Reinforcement Learning , 2006, NAACL.
[37] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..
[38] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[39] P. Massart,et al. Concentration inequalities and model selection , 2007 .
[40] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[41] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[42] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[43] Kathleen M. Jagodnik,et al. Creating a Reinforcement Learning Controller for Functional Electrical Stimulation of a Human Arm. , 2008, The ... Yale Workshop on Adaptive and Learning Systems.
[44] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[45] Lihong Li,et al. Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..
[46] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[47] Richard L. Lewis,et al. Where Do Rewards Come From , 2009 .
[48] Alin Albu-Schäffer,et al. Towards the Robotic Co-Worker , 2009, ISRR.
[49] Lee Spector,et al. Genetic Programming for Reward Function Search , 2010, IEEE Transactions on Autonomous Mental Development.
[50] Larry D. Pyeatt,et al. Reinforcement Learning for Closed-Loop Propofol Anesthesia: A Human Volunteer Study , 2010, IAAI.
[51] Jiming Jiang. Large Sample Techniques for Statistics , 2010, Springer Texts in Statistics.
[52] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[53] Michael Athans,et al. Linear Quadratic Regulator Control , 2010, The Control Systems Handbook.
[54] Richard L. Lewis,et al. Internal Rewards Mitigate Agent Boundedness , 2010, ICML.
[55] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[56] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[57] Petros A. Ioannou,et al. Robust Adaptive Control , 2012 .
[58] Scott Kuindersma,et al. Variable risk control via stochastic optimization , 2013, Int. J. Robotics Res..
[59] Thomas G. Dietterich,et al. Allowing a wildfire to burn: estimating the effect on future fire suppression costs , 2013 .
[60] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[61] David Silver,et al. Concurrent Reinforcement Learning from Customer Interactions , 2013, ICML.
[62] Nicolò Cesa-Bianchi,et al. Bandits With Heavy Tail , 2012, IEEE Transactions on Information Theory.
[63] Ragunathan Rajkumar,et al. Parallel scheduling for cyber-physical systems: Analysis and case study on a self-driving car , 2013, 2013 ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS).
[64] Sridhar Mahadevan,et al. Projected Natural Actor-Critic , 2013, NIPS.
[65] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.
[66] Jaime F. Fisac,et al. Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.
[67] Nick Bostrom,et al. Superintelligence: Paths, Dangers, Strategies , 2014 .
[68] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.
[69] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.
[70] Philip S. Thomas,et al. Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees , 2015, IJCAI.
[71] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[72] Scott Niekum,et al. Policy Evaluation Using the Ω-Return , 2015, NIPS.
[73] Emma Brunskill,et al. Concurrent PAC RL , 2015, AAAI.
[74] Shie Mannor,et al. Off-policy Model-based Learning under Unknown Factored Dynamics , 2015, ICML.
[75] Philip S. Thomas,et al. Ad Recommendation Systems for Life-Time Value Optimization , 2015, WWW.
[76] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[77] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[78] P. Massart. DVORETZKY-KIEFER-WOLFOWITZ INEQUALITY , 2016 .