暂无分享,去创建一个
[1] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[2] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[3] Luc Devroye,et al. Random-Walk Perturbations for Online Combinatorial Optimization , 2015, IEEE Transactions on Information Theory.
[4] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[5] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[6] Jianfeng Gao,et al. Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.
[7] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[8] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[9] Yanjun Han,et al. Batched Multi-armed Bandits Problem , 2019, NeurIPS.
[10] Yishay Mansour,et al. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ICML.
[11] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[12] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[13] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[14] David Simchi-Levi,et al. Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective , 2020, COLT.
[15] Cynthia Rudin,et al. A Practical Bandit Method with Advantages in Neural Network Tuning , 2019, ArXiv.
[16] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[17] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[18] Lin F. Yang,et al. A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost , 2021, ArXiv.
[19] Silvio Lattanzi,et al. Consistent Online Optimization: Convex and Submodular , 2019, AISTATS.
[20] Nikolai Matni,et al. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.
[21] Yuan Zhou,et al. Linear bandits with limited adaptivity and learning distributional optimal design , 2020, STOC.
[22] Quanquan Gu,et al. Logarithmic Regret for Reinforcement Learning with Linear Function Approximation , 2020, ICML.
[23] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[24] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[25] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[26] Francisco S. Melo,et al. Q -Learning with Linear Function Approximation , 2007, COLT.
[27] Reid G. Simmons,et al. Complexity Analysis of Real-Time Reinforcement Learning , 1993, AAAI.
[28] Michael I. Jordan,et al. Bridging Exploration and General Function Approximation in Reinforcement Learning: Provably Efficient Kernel and Neural Value Iterations , 2020, ArXiv.
[29] Ruosong Wang,et al. Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle , 2019, NeurIPS.
[30] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[31] Yuval Peres,et al. Bandits with switching costs: T2/3 regret , 2013, STOC.
[32] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[33] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2020, ICML.
[34] Quanquan Gu,et al. Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes , 2020, COLT.
[35] Vianney Perchet,et al. Batched Bandit Problems , 2015, COLT.
[36] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[37] Ruosong Wang,et al. Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension , 2020, NeurIPS.
[38] Ruosong Wang,et al. Optimism in Reinforcement Learning with Generalized Linear Function Approximation , 2019, ICLR.
[39] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[40] Kunal Talwar,et al. Online learning over a finite action set with limited switching , 2018, COLT.
[41] Xiangyang Ji,et al. Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon , 2020, ArXiv.
[42] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[43] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[44] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[45] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[46] Ruosong Wang,et al. Provably Efficient Reinforcement Learning with General Value Function Approximation , 2020, ArXiv.
[47] Gergely Neu,et al. A Unifying View of Optimism in Episodic Reinforcement Learning , 2020, NeurIPS.
[48] Zheng Wen,et al. Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization , 2013, Math. Oper. Res..
[49] Nicolò Cesa-Bianchi,et al. Online Learning with Switching Costs and Other Adaptive Adversaries , 2013, NIPS.
[50] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.
[51] Yu Bai,et al. Provably Efficient Q-Learning with Low Switching Cost , 2019, NeurIPS.
[52] Yishay Mansour,et al. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ArXiv.
[53] Berthold Vöcking,et al. Regret Minimization for Online Buffering Problems Using the Weighted Majority Algorithm , 2010, Electron. Colloquium Comput. Complex..
[54] David S. Touretzky,et al. Advances in neural information processing systems 2 , 1989 .
[55] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[56] Martin J. Wainwright,et al. Variance-reduced Q-learning is minimax optimal , 2019, ArXiv.
[57] Ambuj Tewari,et al. Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.
[58] Alessandro Lazaric,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[59] Quanquan Gu,et al. Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping , 2020, ICML.
[60] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[61] Tor Lattimore,et al. Learning with Good Feature Representations in Bandits and in RL with a Generative Model , 2020, ICML.
[62] Zheng Wen,et al. Efficient Exploration and Value Function Generalization in Deterministic Systems , 2013, NIPS.
[63] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[64] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[65] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[66] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[67] Nevena Lazic,et al. Model-Free Linear Quadratic Control via Reduction to Expert Prediction , 2018, AISTATS.
[68] David B. Dunson,et al. Lipschitz Bandit Optimization with Improved Efficiency , 2019, ArXiv.
[69] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[70] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[71] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.
[72] Yanjun Han,et al. Sequential Batch Learning in Finite-Action Linear Contextual Bandits , 2020, ArXiv.
[73] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[74] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[75] Chi Jin. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2020 .
[76] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[77] Benjamin Van Roy,et al. On Lower Bounds for Regret in Reinforcement Learning , 2016, ArXiv.
[78] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[79] Alessandro Lazaric,et al. Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.
[80] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[81] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[82] Amin Karbasi,et al. Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition , 2020, NeurIPS.