暂无分享,去创建一个
Michael I. Jordan | Chi Jin | Zhuoran Yang | Mengdi Wang | Zhaoran Wang | Chi Jin | Zhuoran Yang | Zhaoran Wang | Mengdi Wang
[1] Qi Cai,et al. Neural Temporal-Difference Learning Converges to Global Optima , 2019, NeurIPS.
[2] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.
[3] Francis Bach,et al. A Note on Lazy Training in Supervised Differentiable Programming , 2018, ArXiv.
[4] Nello Cristianini,et al. Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.
[5] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[6] Jason D. Lee,et al. Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks , 2019, ICLR.
[7] Yuan Yao,et al. Mercer's Theorem, Feature Maps, and Smoothing , 2006, COLT.
[8] Ruosong Wang,et al. Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension , 2020, NeurIPS.
[9] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[10] Ruosong Wang,et al. Provably Efficient Reinforcement Learning with General Value Function Approximation , 2020, ArXiv.
[11] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[12] Benjamin Van Roy,et al. Eluder Dimension and the Sample Complexity of Optimistic Exploration , 2013, NIPS.
[13] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[14] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[15] Martin J. Wainwright,et al. Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates , 2013, J. Mach. Learn. Res..
[16] Zheng Wen,et al. Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization , 2013, Math. Oper. Res..
[17] Claus Müller. Analysis of Spherical Symmetries in Euclidean Spaces , 1997 .
[18] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[19] Benjamin Van Roy,et al. Model-based Reinforcement Learning and the Eluder Dimension , 2014, NIPS.
[20] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[21] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[22] Shie Mannor,et al. Optimistic Policy Optimization with Bandit Feedback , 2020, ICML.
[23] Daniel Russo,et al. Worst-Case Regret Bounds for Exploration via Randomized Value Functions , 2019, NeurIPS.
[24] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[25] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[26] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[27] Benjamin Van Roy,et al. Comments on the Du-Kakade-Wang-Yang Lower Bounds , 2019, ArXiv.
[28] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[29] Cho-Jui Hsieh,et al. Convergence of Adversarial Training in Overparametrized Neural Networks , 2019, NeurIPS.
[30] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.
[31] Debdeep Pati,et al. Frequentist coverage and sup-norm convergence rate in Gaussian process regression , 2017, 1708.04753.
[32] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[33] Aditya Gopalan,et al. On Kernelized Multi-armed Bandits , 2017, ICML.
[34] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[35] Quanquan Gu,et al. Neural Contextual Bandits with Upper Confidence Bound-Based Exploration , 2019, ArXiv.
[36] Roman Vershynin,et al. High-Dimensional Probability , 2018 .
[37] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[38] John Langford,et al. PAC Reinforcement Learning with Rich Observations , 2016, NIPS.
[39] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[40] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[41] Joelle Pineau,et al. Streaming kernel regression with provably adaptive mean, variance, and regularization , 2017, J. Mach. Learn. Res..
[42] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .
[43] John D. Lafferty,et al. Diffusion Kernels on Statistical Manifolds , 2005, J. Mach. Learn. Res..
[44] Nan Jiang,et al. On Oracle-Efficient PAC RL with Rich Observations , 2018, NeurIPS.
[45] Quanquan Gu,et al. Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks , 2019, AAAI.
[46] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[47] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[48] William Yang Wang,et al. Deep Reinforcement Learning for NLP , 2018, ACL.
[49] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[50] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[51] Guang Cheng,et al. Local and global asymptotic inference in smoothing spline models , 2012, 1212.6788.
[52] Yuan Cao,et al. A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , 2019, ArXiv.
[53] Zheng Wen,et al. Efficient Exploration and Value Function Generalization in Deterministic Systems , 2013, NIPS.
[54] Andreas Krause,et al. No-Regret Learning in Unknown Games with Correlated Payoffs , 2019, NeurIPS.
[55] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[56] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[57] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[58] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[59] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[60] KrauseAndreas,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2012 .
[61] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[62] Jian Peng,et al. √n-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank , 2019, COLT.
[63] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2020, ICML.
[64] Lei Wu. How SGD Selects the Global Minima in Over-parameterized Learning : A Dynamical Stability Perspective , 2018 .
[65] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[66] Daniele Calandriello,et al. Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret , 2019, COLT.
[67] Alex Smola,et al. Kernel methods in machine learning , 2007, math/0701907.
[68] Quanquan Gu,et al. Neural Contextual Bandits with UCB-based Exploration , 2019, ICML.
[69] Quanquan Gu,et al. Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping , 2020, ICML.
[70] Mengdi Wang,et al. Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.
[71] Tor Lattimore,et al. Learning with Good Feature Representations in Bandits and in RL with a Generative Model , 2020, ICML.
[72] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[73] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[74] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[75] Akshay Krishnamurthy,et al. Information Theoretic Regret Bounds for Online Nonlinear Control , 2020, NeurIPS.
[76] S. Mendelson,et al. Regularization in kernel learning , 2010, 1001.2094.
[77] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[78] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..
[79] Alessandro Lazaric,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[80] Nan Jiang,et al. Provably efficient RL with Rich Observations via Latent State Decoding , 2019, ICML.
[81] Han Liu,et al. Nonparametric Heterogeneity Testing For Massive Data , 2016, 1601.06212.
[82] Alessandro Lazaric,et al. Frequentist Regret Bounds for Randomized Least-Squares Value Iteration , 2019, AISTATS.
[83] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[84] Ruosong Wang,et al. Optimism in Reinforcement Learning with Generalized Linear Function Approximation , 2019, ICLR.
[85] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[86] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[87] Andreas Krause,et al. Contextual Gaussian Process Bandit Optimization , 2011, NIPS.