暂无分享,去创建一个
Michael I. Jordan | Siyu Chen | Zhuoran Yang | Zhaoran Wang | Yufeng Zhang | Zhuoran Yang | Zhaoran Wang | Yufeng Zhang | Siyu Chen
[1] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[2] Sajad Khodadadian,et al. Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm , 2021 .
[3] Adel Javanmard,et al. Analysis of a Two-Layer Neural Network via Displacement Convexity , 2019, The Annals of Statistics.
[4] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[5] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[6] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.
[7] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[8] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[9] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[10] Patrick T. Harker,et al. Finite-dimensional variational inequality and nonlinear complementarity problems: A survey of theory, algorithms and applications , 1990, Math. Program..
[11] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[12] Tilman Börgers,et al. Learning Through Reinforcement and Replicator Dynamics , 1997 .
[13] C. Villani,et al. Generalization of an Inequality by Talagrand and Links with the Logarithmic Sobolev Inequality , 2000 .
[14] Tuo Zhao,et al. On Computation and Generalization of Generative Adversarial Imitation Learning , 2020, ICLR.
[15] Allan Pinkus,et al. Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.
[16] K. Friedrichs. The identity of weak and strong extensions of differential operators , 1944 .
[17] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[18] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[19] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[20] Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
[21] Pierre Baldi,et al. Solving the Rubik’s cube with deep reinforcement learning and search , 2019, Nature Machine Intelligence.
[22] Yuan Cao,et al. Mean-Field Analysis of Two-Layer Neural Networks: Non-Asymptotic Rates and Generalization Bounds , 2020, ArXiv.
[23] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[24] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[25] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[26] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[27] Hanze Dong,et al. Over Parameterized Two-level Neural Networks Can Learn Near Optimal Feature Representations , 2019, ArXiv.
[28] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[29] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[30] Jianfeng Lu,et al. A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth , 2020, ICML.
[31] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[32] Zhe Wang,et al. Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms , 2020, ArXiv.
[33] Daniel Hennes,et al. Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients , 2020, AAMAS.
[34] Quanquan Gu,et al. A Finite Time Analysis of Two Time-Scale Actor Critic Methods , 2020, NeurIPS.
[35] Jianfeng Lu,et al. Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime , 2020, ICLR.
[36] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[37] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[38] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[39] Andrea Montanari,et al. Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit , 2019, COLT.
[40] L. Ambrosio,et al. Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .
[41] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[42] Zhaoran Wang,et al. A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic , 2020, ArXiv.
[43] Zhaoran Wang,et al. Neural Policy Gradient Methods: Global Optimality and Rates of Convergence , 2019, ICLR.
[44] L. Ambrosio,et al. A User’s Guide to Optimal Transport , 2013 .
[45] Shie Mannor,et al. Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..
[46] Colin Wei,et al. Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.
[47] Francis Bach,et al. A Note on Lazy Training in Supervised Differentiable Programming , 2018, ArXiv.
[48] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[49] Yingbin Liang,et al. Improving Sample Complexity Bounds for Actor-Critic Algorithms , 2020, ArXiv.
[50] Qi Cai,et al. Neural Temporal-Difference Learning Converges to Global Optima , 2019, NeurIPS.
[51] Justin A. Sirignano,et al. Mean field analysis of neural networks: A central limit theorem , 2018, Stochastic Processes and their Applications.
[52] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[53] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[54] C. Villani. Topics in Optimal Transportation , 2003 .
[55] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[56] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[57] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[58] C. Villani. Optimal Transport: Old and New , 2008 .
[59] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[60] Zhuoran Yang,et al. Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy , 2020, ICLR.
[61] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[62] Tong Zhang,et al. Convex Formulation of Overparameterized Deep Neural Networks , 2019, IEEE Transactions on Information Theory.
[63] Yufeng Zhang,et al. Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory , 2020, NeurIPS.
[64] Konstantinos Spiliopoulos,et al. Mean Field Analysis of Neural Networks: A Law of Large Numbers , 2018, SIAM J. Appl. Math..
[65] Jianfeng Lu,et al. Temporal-difference learning for nonlinear value function approximation in the lazy training regime , 2019, ArXiv.