Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints
暂无分享,去创建一个
[1] Shachar Lovett,et al. Bilinear Classes: A Structural Framework for Provable Generalization in RL , 2021, ICML.
[2] Lin F. Yang,et al. A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost , 2021, ArXiv.
[3] Quanquan Gu,et al. Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes , 2020, COLT.
[4] Quanquan Gu,et al. Logarithmic Regret for Reinforcement Learning with Linear Function Approximation , 2020, ICML.
[5] Michael I. Jordan,et al. Bridging Exploration and General Function Approximation in Reinforcement Learning: Provably Efficient Kernel and Neural Value Iterations , 2020, ArXiv.
[6] David Simchi-Levi,et al. Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective , 2020, COLT.
[7] S. Du,et al. Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon , 2020, COLT.
[8] Yuan Zhou,et al. Linear bandits with limited adaptivity and learning distributional optimal design , 2020, STOC.
[9] Gergely Neu,et al. A Unifying View of Optimism in Episodic Reinforcement Learning , 2020, NeurIPS.
[10] Quanquan Gu,et al. Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping , 2020, ICML.
[11] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[12] Ruosong Wang,et al. Provably Efficient Reinforcement Learning with General Value Function Approximation , 2020, ArXiv.
[13] Lin F. Yang,et al. Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension , 2020, NeurIPS.
[14] Yanjun Han,et al. Sequential Batch Learning in Finite-Action Linear Contextual Bandits , 2020, ArXiv.
[15] Mykel J. Kochenderfer,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[16] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[17] Ruosong Wang,et al. Optimism in Reinforcement Learning with Generalized Linear Function Approximation , 2019, ICLR.
[18] Csaba Szepesvari,et al. Learning with Good Feature Representations in Bandits and in RL with a Generative Model , 2019, ICML.
[19] Amin Karbasi,et al. Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition , 2019, NeurIPS.
[20] Amin Karbasi,et al. Regret Bounds for Batched Bandits , 2019, AAAI.
[21] Lin F. Yang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2019, ICLR.
[22] Ruosong Wang,et al. Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle , 2019, NeurIPS.
[23] Martin J. Wainwright,et al. Variance-reduced Q-learning is minimax optimal , 2019, ArXiv.
[24] Yu Bai,et al. Provably Efficient Q-Learning with Low Switching Cost , 2019, NeurIPS.
[25] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[26] Yishay Mansour,et al. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ICML.
[27] David B. Dunson,et al. Lipschitz Bandit Optimization with Improved Efficiency , 2019, ArXiv.
[28] Silvio Lattanzi,et al. Consistent Online Optimization: Convex and Submodular , 2019, AISTATS.
[29] Yanjun Han,et al. Batched Multi-armed Bandits Problem , 2019, NeurIPS.
[30] Yishay Mansour,et al. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ArXiv.
[31] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[32] C. Rudin,et al. Towards Practical Lipschitz Bandits , 2019, FODS.
[33] Alessandro Lazaric,et al. Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.
[34] Nikolai Matni,et al. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.
[35] Nevena Lazic,et al. Model-Free Linear Quadratic Control via Reduction to Expert Prediction , 2018, AISTATS.
[36] Kunal Talwar,et al. Online learning over a finite action set with limited switching , 2018, COLT.
[37] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[38] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[39] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[40] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[41] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[42] Benjamin Van Roy,et al. On Lower Bounds for Regret in Reinforcement Learning , 2016, ArXiv.
[43] Jianfeng Gao,et al. Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.
[44] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[45] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[46] Luc Devroye,et al. Random-Walk Perturbations for Online Combinatorial Optimization , 2015, IEEE Transactions on Information Theory.
[47] Vianney Perchet,et al. Batched Bandit Problems , 2015, COLT.
[48] Michael I. Jordan,et al. Trust Region Policy Optimization , 2015, ICML.
[49] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[50] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[51] Yuval Peres,et al. Bandits with switching costs: T2/3 regret , 2013, STOC.
[52] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[53] Zheng Wen,et al. Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization , 2013, Math. Oper. Res..
[54] Zheng Wen,et al. Efficient Exploration and Value Function Generalization in Deterministic Systems , 2013, NIPS.
[55] Nicolò Cesa-Bianchi,et al. Online Learning with Switching Costs and Other Adaptive Adversaries , 2013, NIPS.
[56] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[57] Ambuj Tewari,et al. Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.
[58] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[59] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[60] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.
[61] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[62] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[63] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[64] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[65] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[66] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[67] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[68] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[69] Francisco S. Melo,et al. Q -Learning with Linear Function Approximation , 2007, COLT.
[70] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[71] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[72] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[73] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[74] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[75] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[76] Reid G. Simmons,et al. Complexity Analysis of Real-Time Reinforcement Learning , 1993, AAAI.
[77] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[78] Michael I. Jordan,et al. On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces , 2021 .
[79] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[80] Guido Sanguinetti,et al. Advances in Neural Information Processing Systems 24 , 2011 .
[81] Berthold Vöcking,et al. Regret Minimization for Online Buffering Problems Using the Weighted Majority Algorithm , 2010, Electron. Colloquium Comput. Complex..
[82] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[83] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[84] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[85] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.