On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
暂无分享,去创建一个
[1] Quanquan Gu,et al. Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping , 2020, ICML.
[2] Ruosong Wang,et al. Optimism in Reinforcement Learning with Generalized Linear Function Approximation , 2019, ICLR.
[3] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .
[4] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[5] S. Kakade,et al. Information Theoretic Regret Bounds for Online Nonlinear Control , 2020, NeurIPS.
[6] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[7] Ruosong Wang,et al. Provably Efficient Reinforcement Learning with General Value Function Approximation , 2020, ArXiv.
[8] Mykel J. Kochenderfer,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[9] Shie Mannor,et al. Optimistic Policy Optimization with Bandit Feedback , 2020, ICML.
[10] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[11] Csaba Szepesvari,et al. Learning with Good Feature Representations in Bandits and in RL with a Generative Model , 2019, ICML.
[12] Alessandro Lazaric,et al. Frequentist Regret Bounds for Randomized Least-Squares Value Iteration , 2019, AISTATS.
[13] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[14] Jason D. Lee,et al. Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks , 2019, ICLR.
[15] Jian Peng,et al. √n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank , 2019, COLT.
[16] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[17] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[18] Benjamin Van Roy,et al. Comments on the Du-Kakade-Wang-Yang Lower Bounds , 2019, ArXiv.
[19] Quanquan Gu,et al. Neural Contextual Bandits with Upper Confidence Bound-Based Exploration , 2019, ArXiv.
[20] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[21] Pier Giuseppe Sessa,et al. No-Regret Learning in Unknown Games with Correlated Payoffs , 2019, NeurIPS.
[22] Cho-Jui Hsieh,et al. Convergence of Adversarial Training in Overparametrized Neural Networks , 2019, NeurIPS.
[23] Daniel Russo,et al. Worst-Case Regret Bounds for Exploration via Randomized Value Functions , 2019, NeurIPS.
[24] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.
[25] J. Lee,et al. Neural Temporal-Difference Learning Converges to Global Optima , 2019, NeurIPS.
[26] Daniele Calandriello,et al. Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret , 2019, COLT.
[27] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[28] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[29] Yuan Cao,et al. A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , 2019, ArXiv.
[30] Nan Jiang,et al. Provably efficient RL with Rich Observations via Latent State Decoding , 2019, ICML.
[31] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[32] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[33] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[34] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[35] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[36] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[37] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[38] Francis Bach,et al. A Note on Lazy Training in Supervised Differentiable Programming , 2018, ArXiv.
[39] Roman Vershynin,et al. High-Dimensional Probability , 2018 .
[40] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[41] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[42] William Yang Wang,et al. Deep Reinforcement Learning for NLP , 2018, ACL.
[43] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[44] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[45] Nan Jiang,et al. On Oracle-Efficient PAC RL with Rich Observations , 2018, NeurIPS.
[46] Joelle Pineau,et al. Streaming kernel regression with provably adaptive mean, variance, and regularization , 2017, J. Mach. Learn. Res..
[47] Lei Wu. How SGD Selects the Global Minima in Over-parameterized Learning : A Dynamical Stability Perspective , 2018 .
[48] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[49] Debdeep Pati,et al. Frequentist coverage and sup-norm convergence rate in Gaussian process regression , 2017, 1708.04753.
[50] Aditya Gopalan,et al. On Kernelized Multi-armed Bandits , 2017, ICML.
[51] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[52] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[53] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[54] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[55] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..
[56] Zheng Wen,et al. Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization , 2013, Math. Oper. Res..
[57] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[58] John Langford,et al. PAC Reinforcement Learning with Rich Observations , 2016, NIPS.
[59] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[60] Han Liu,et al. Nonparametric Heterogeneity Testing For Massive Data , 2016, 1601.06212.
[61] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[62] Martin J. Wainwright,et al. Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates , 2013, J. Mach. Learn. Res..
[63] Michael Rabadi,et al. Kernel Methods for Machine Learning , 2015 .
[64] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[65] Benjamin Van Roy,et al. Eluder Dimension and the Sample Complexity of Optimistic Exploration , 2013, NIPS.
[66] Nello Cristianini,et al. Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.
[67] Zheng Wen,et al. Efficient Exploration and Value Function Generalization in Deterministic Systems , 2013, NIPS.
[68] Guang Cheng,et al. Local and global asymptotic inference in smoothing spline models , 2012, 1212.6788.
[69] S. Kakade,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2012, IEEE Transactions on Information Theory.
[70] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[71] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[72] Andreas Krause,et al. Contextual Gaussian Process Bandit Optimization , 2011, NIPS.
[73] S. Mendelson,et al. Regularization in kernel learning , 2010, 1001.2094.
[74] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[75] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[76] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[77] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.
[78] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[79] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[80] Yuan Yao,et al. Mercer's Theorem, Feature Maps, and Smoothing , 2006, COLT.
[81] John D. Lafferty,et al. Diffusion Kernels on Statistical Manifolds , 2005, J. Mach. Learn. Res..
[82] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[83] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[84] Claus Müller. Analysis of Spherical Symmetries in Euclidean Spaces , 1997 .
[85] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[86] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .