Regret Bounds for the Adaptive Control of Linear Quadratic Systems
暂无分享,去创建一个
Csaba Szepesvári | Yasin Abbasi-Yadkori | Csaba Szepesvari | Yasin Abbasi-Yadkori | Yasin Abbasi-Yadkori
[1] H. Simon,et al. Dynamic Programming Under Uncertainty with a Quadratic Criterion Function , 1956 .
[2] R. Bellman. Dynamic programming. , 1957, Science.
[3] B. Anderson,et al. Linear Optimal Control , 1971 .
[4] G. Grisetti,et al. Further Reading , 1984, IEEE Spectrum.
[5] T. Lai,et al. Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems , 1982 .
[6] P. Kumar,et al. Adaptive control with the stochastic approximation algorithm: Geometry and convergence , 1985 .
[7] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[8] Han-Fu Chen,et al. Optimal adaptive control and consistent parameter estimates for ARMAX model with quadratic cost , 1986, 1986 25th IEEE Conference on Decision and Control.
[9] T. Lai,et al. Asymptotically efficient self-tuning regulators , 1987 .
[10] Han-Fu Chen,et al. Optimal adaptive control and consistent parameter estimates for ARMAX model withquadratic cost , 1987 .
[11] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[12] Han-Fu Chen,et al. Identification and adaptive control for systems with unknown orders, delay, and coefficients , 1990 .
[13] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.
[14] Christopher G. Atkeson,et al. Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.
[15] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[16] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[17] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[18] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[19] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[20] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[21] Steven J. Bradtke,et al. Incremental dynamic programming for on-line adaptive optimal control , 1995 .
[22] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .
[23] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[24] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[25] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[26] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[27] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[28] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[29] Claude-Nicolas Fiechter,et al. PAC adaptive control of linear systems , 1997, COLT '97.
[30] Vijay Balasubramanian,et al. Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions , 1996, Neural Computation.
[31] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[32] P. Kumar,et al. Adaptive Linear Quadratic Gaussian Control: The Cost-Biased Approach Revisited , 1998 .
[33] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[34] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[35] J. W. Nieuwenhuis,et al. Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .
[36] Michael I. Jordan,et al. On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001 .
[37] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[38] Christian Schindelhauer,et al. Discrete Prediction Games with Arbitrary Feedback and Loss , 2001, COLT/EuroCOLT.
[39] T. Lai,et al. Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .
[40] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[41] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[42] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.
[43] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[44] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[45] A. Shapiro. Monte Carlo Sampling Methods , 2003 .
[46] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[47] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.
[48] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[49] Abhijit Gosavi,et al. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .
[50] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[51] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[52] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[53] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[54] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[55] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[56] Vladislav Tadic,et al. On the Convergence of Temporal-Difference Learning with Linear Function Approximation , 2001, Machine Learning.
[57] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.
[58] Richard S. Sutton,et al. Temporal-Difference Networks , 2004, NIPS.
[59] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[60] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[61] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[62] Richard S. Sutton,et al. Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.
[63] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[64] Zhiliang Ying,et al. EFFICIENT RECURSIVE ESTIMATION AND ADAPTIVE CONTROL IN STOCHASTIC REGRESSION AND , 2006 .
[65] Nicolò Cesa-Bianchi,et al. Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.
[66] S. Bittanti,et al. ADAPTIVE CONTROL OF LINEAR TIME INVARIANT SYSTEMS: THE "BET ON THE BEST" PRINCIPLE ∗ , 2006 .
[67] Thomas P. Hayes,et al. Robbing the bandit: less regret in online geometric optimization against an adaptive adversary , 2006, SODA '06.
[68] R. Sutton. Gain Adaptation Beats Least Squares , 2006 .
[69] Warren B. Powell,et al. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.
[70] Xi-Ren Cao,et al. Stochastic learning and optimization - A sensitivity-based approach , 2007, Annu. Rev. Control..
[71] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[72] Martin A. Riedmiller,et al. Neural Reinforcement Learning Controllers for a Real Robot Application , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.
[73] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.
[74] Peter Auer,et al. Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.
[75] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[76] András Lörincz,et al. The many faces of optimism: a unifying approach , 2008, ICML '08.
[77] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[78] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[79] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .
[80] Csaba Szepesvári,et al. Empirical Bernstein stopping , 2008, ICML '08.
[81] M. Kosorok. Introduction to Empirical Processes and Semiparametric Inference , 2008 .
[82] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[83] Onur Mutlu,et al. Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.
[84] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[85] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[86] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[87] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[88] Makoto Hashizume,et al. Development of a colon endoscope robot that adjusts its locomotion through the use of reinforcement learning , 2010, International Journal of Computer Assisted Radiology and Surgery.
[89] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[90] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[91] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[92] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[93] Csaba Szepesvári,et al. Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems , 2011, ArXiv.
[94] Steven I. Marcus,et al. Simulation-based Algorithms for Markov Decision Processes/ Hyeong Soo Chang ... [et al.] , 2013 .