Reinforcement Learning: Theory and Algorithms
暂无分享,去创建一个
[1] Ruosong Wang,et al. What are the Statistical Limits of Offline RL with Linear Function Approximation? , 2020, ICLR.
[2] Quanquan Gu,et al. Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping , 2020, ICML.
[3] Aleksandrs Slivkins,et al. Corruption Robust Exploration in Episodic Reinforcement Learning , 2019, COLT.
[4] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[5] Siddhartha Srinivasa,et al. Imitation Learning as f-Divergence Minimization , 2019, WAFR.
[6] Wen Sun,et al. PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning , 2020, NeurIPS.
[7] Andrey Kolobov,et al. Policy Improvement from Multiple Experts , 2020, ArXiv.
[8] S. Kakade,et al. FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs , 2020, NeurIPS.
[9] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[10] Yuxin Chen,et al. Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model , 2020, NeurIPS.
[11] Dale Schuurmans,et al. On the Global Convergence Rates of Softmax Policy Gradient Methods , 2020, ICML.
[12] Mikael Henaff,et al. Disagreement-Regularized Imitation Learning , 2020, ICLR.
[13] Nan Jiang,et al. $Q^\star$ Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison , 2020, 2003.03924.
[14] Dylan J. Foster,et al. Logarithmic Regret for Adversarial Online Control , 2020, ICML.
[15] Mengdi Wang,et al. Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation , 2020, ICML.
[16] Babak Hassibi,et al. The Power of Linear Controllers in LQR Control , 2020, 2022 IEEE 61st Conference on Decision and Control (CDC).
[17] Max Simchowitz,et al. Naive Exploration is Optimal for Online LQR , 2020, ICML.
[18] Max Simchowitz,et al. Improper Learning for Non-Stochastic Control , 2020, COLT.
[19] Ambuj Tewari,et al. Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles , 2019, AISTATS.
[20] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[21] S. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[22] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[23] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[24] Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
[25] Lin F. Yang,et al. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal , 2019, COLT.
[26] Peter L. Bartlett,et al. POLITEX: Regret Bounds for Policy Iteration using Expert Prediction , 2019, ICML.
[27] Byron Boots,et al. Provably Efficient Imitation Learning from Observation Alone , 2019, ICML.
[28] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[29] Sham M. Kakade,et al. Online Control with Adversarial Disturbances , 2019, ICML.
[30] Benjamin Recht,et al. Certainty Equivalent Control of LQR is Efficient , 2019, ArXiv.
[31] Yishay Mansour,et al. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ArXiv.
[32] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[33] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[34] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.
[35] Nan Jiang,et al. Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches , 2018, COLT.
[36] Lin F. Yang,et al. Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model , 2018, 1806.01492.
[37] Nikolai Matni,et al. On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.
[38] Nikolai Matni,et al. A System-Level Approach to Controller Synthesis , 2016, IEEE Transactions on Automatic Control.
[39] Nan Jiang,et al. On Oracle-Efficient PAC Reinforcement Learning with Rich Observations , 2018 .
[40] Nikolai Matni,et al. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.
[41] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.
[42] Nolan Wagener,et al. Fast Policy Learning through Imitation and Reinforcement , 2018, UAI.
[43] Nan Jiang,et al. Hierarchical Imitation and Reinforcement Learning , 2018, ICML.
[44] Michael I. Jordan,et al. Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification , 2018, COLT.
[45] Byron Boots,et al. Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning , 2018, ICLR.
[46] Byron Boots,et al. Convergence of Value Aggregation for Imitation Learning , 2018, AISTATS.
[47] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.
[48] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.
[49] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.
[50] Prateek Jain,et al. Non-convex Optimization for Machine Learning , 2017, Found. Trends Mach. Learn..
[51] Amir Beck,et al. First-Order Methods in Optimization , 2017 .
[52] Marcello Restelli,et al. Boosted Fitted Q-Iteration , 2017, ICML.
[53] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[54] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[55] Byron Boots,et al. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.
[56] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[57] Evangelos A. Theodorou,et al. Model Predictive Path Integral Control: From Theory to Parallel Computation , 2017 .
[58] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[59] Benjamin Van Roy,et al. On Lower Bounds for Regret in Reinforcement Learning , 2016, ArXiv.
[60] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.
[61] John Langford,et al. PAC Reinforcement Learning with Rich Observations , 2016, NIPS.
[62] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[63] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[64] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[65] John Langford,et al. Learning to Search Better than Your Teacher , 2015, ICML.
[66] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[67] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[68] Matthieu Geist,et al. Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search , 2014, ECML/PKDD.
[69] J. Andrew Bagnell,et al. Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.
[70] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[71] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[72] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[73] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[74] Martial Hebert,et al. Activity Forecasting , 2012, ECCV.
[75] Leslie Pack Kaelbling,et al. LQR-RRT*: Optimal sampling-based motion planning with automatically derived extension heuristics , 2012, 2012 IEEE International Conference on Robotics and Automation.
[76] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[77] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.
[78] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[79] Yinyu Ye,et al. The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate , 2011, Math. Oper. Res..
[80] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[81] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.
[82] Anind K. Dey,et al. Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.
[83] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[84] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.
[85] Siddhartha S. Srinivasa,et al. Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[86] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[87] Russ Tedrake,et al. LQR-trees: Feedback motion planning on sparse randomized trees , 2009, Robotics: Science and Systems.
[88] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.
[89] Sham M. Kakade,et al. A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..
[90] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[91] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[92] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[93] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[94] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[95] Robert E. Schapire,et al. A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.
[96] Kevin L. Moore,et al. Iterative Learning Control: Brief Survey and Categorization , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[97] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[98] Yinyu Ye,et al. A New Complexity Result on Solving the Markov Decision Problem , 2005, Math. Oper. Res..
[99] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[100] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..
[101] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[102] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[103] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[104] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[105] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[106] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[107] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[108] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[109] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[110] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[111] Venkataramanan Balakrishnan,et al. Semidefinite programming duality and linear time-invariant systems , 2003, IEEE Trans. Autom. Control..
[112] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[113] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[114] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[115] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[116] S. R. Jammalamadaka,et al. Empirical Processes in M-Estimation , 2001 .
[117] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[118] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[119] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[120] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[121] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[122] Yishay Mansour,et al. On the Complexity of Policy Iteration , 1999, UAI.
[123] Philip M. Long,et al. Associative Reinforcement Learning using Linear Probabilistic Concepts , 1999, ICML.
[124] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[125] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[126] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[127] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[128] B. Anderson,et al. Optimal control: linear quadratic methods , 1990 .
[129] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .
[130] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .
[131] Dante C. Youla,et al. Modern Wiener-Hopf Design of Optimal Controllers. Part I , 1976 .
[132] R Bellman,et al. DYNAMIC PROGRAMMING AND LAGRANGE MULTIPLIERS. , 1956, Proceedings of the National Academy of Sciences of the United States of America.
[133] H. Robbins. Some aspects of the sequential design of experiments , 1952 .