A Theoretical Analysis of Deep Q-Learning
暂无分享,去创建一个
[1] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.
[2] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[3] J. Friedman,et al. Projection Pursuit Regression , 1981 .
[4] C. J. Stone,et al. Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .
[5] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[6] Wolfgang Maass,et al. Neural Nets with Superlinear VC-Dimension , 1994, Neural Computation.
[7] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[8] Dimitri P. Bertsekas,et al. Stochastic shortest path games: theory and algorithms , 1997 .
[9] Stephen D. Patek,et al. Stochastic and shortest path games: theory and algorithms , 1997 .
[10] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.
[11] Peter L. Bartlett,et al. Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.
[12] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[13] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[14] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[15] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.
[16] Roberto Frias,et al. A brief survey , 2011 .
[17] Michail G. Lagoudakis,et al. Value Function Approximation in Zero-Sum Markov Games , 2002, UAI.
[18] S. Murphy,et al. Optimal dynamic treatment regimes , 2003 .
[19] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[20] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[21] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[22] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[23] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[24] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[25] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[26] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[27] Susan A. Murphy,et al. A Generalization Error for Q-Learning , 2005, J. Mach. Learn. Res..
[28] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.
[29] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[30] A. Antos,et al. Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[31] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[32] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[33] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[34] Alex Smola,et al. Kernel methods in machine learning , 2007, math/0701907.
[35] Benjamin Recht,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.
[36] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.
[37] Shie Mannor,et al. Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems , 2009, 2009 American Control Conference.
[38] M. Kosorok,et al. Reinforcement learning design for cancer clinical trials , 2009, Statistics in medicine.
[39] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[40] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[41] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[42] S. Murphy,et al. PERFORMANCE GUARANTEES FOR INDIVIDUALIZED TREATMENT RULES. , 2011, Annals of statistics.
[43] M. Kosorok,et al. Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer , 2011, Biometrics.
[44] Inbal Nahum-Shani,et al. Q-learning: a data analysis method for constructing adaptive interventions. , 2012, Psychological methods.
[45] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[46] Donglin Zeng,et al. Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.
[47] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[48] M. Kosorok,et al. Q-LEARNING WITH CENSORED DATA. , 2012, Annals of statistics.
[49] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.
[50] Eric B. Laber,et al. A Robust Method for Estimating Optimal Treatment Regimes , 2012, Biometrics.
[51] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[52] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[53] Michael R. Kosorok,et al. Adaptive Q-learning , 2013 .
[54] B. Chakraborty,et al. Statistical Methods for Dynamic Treatment Regimes: Reinforcement Learning, Causal Inference, and Personalized Medicine , 2013 .
[55] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[56] Eric B. Laber,et al. Dynamic treatment regimes: Technical challenges and applications , 2014 .
[57] Anastasios A. Tsiatis,et al. Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes , 2012, Statistical science : a review journal of the Institute of Mathematical Statistics.
[58] Bruno Scherrer,et al. Rate of Convergence and Error Bounds for LSTD(λ) , 2014, ICML 2015.
[59] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[60] Shalabh Bhatnagar,et al. Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games , 2015, AAMAS.
[61] Peter Sunehag,et al. Reinforcement Learning in Large Discrete Action Spaces , 2015, ArXiv.
[62] Richard Evans,et al. Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.
[63] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[64] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[65] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[66] Donglin Zeng,et al. New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes , 2015, Journal of the American Statistical Association.
[67] Hassan Foroosh,et al. Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[68] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[69] Bruno Scherrer,et al. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.
[70] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.
[71] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[72] M R Kosorok,et al. Penalized Q-Learning for Dynamic Treatment Regimens. , 2011, Statistica Sinica.
[73] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[74] Bruno Scherrer,et al. On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games , 2016, AISTATS.
[75] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[76] Jason M. Klusowski,et al. Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks , 2016, 1607.01434.
[77] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[78] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[79] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[80] Matthieu Geist,et al. Softened Approximate Policy Iteration for Markov Games , 2016, ICML.
[81] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[82] Shie Mannor,et al. Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..
[83] Marc Peter Deisenroth,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.
[84] Johannes Schmidt-Hieber,et al. Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.
[85] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[86] Leslie Pack Kaelbling,et al. Generalization in Deep Learning , 2017, ArXiv.
[87] Anil A. Bharath,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.
[88] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[89] Shie Mannor,et al. Shallow Updates for Deep Reinforcement Learning , 2017, NIPS.
[90] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[91] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[92] Francis R. Bach,et al. On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..
[93] Michael R Kosorok,et al. Residual Weighted Learning for Estimating Individualized Treatment Rules , 2015, Journal of the American Statistical Association.
[94] Richard S. Sutton,et al. A Deeper Look at Experience Replay , 2017, ArXiv.
[95] Jiliang Tang,et al. A Survey on Dialogue Systems: Recent Advances and New Frontiers , 2017, SKDD.
[96] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[97] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[98] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[99] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[100] Marcello Restelli,et al. Boosted Fitted Q-Iteration , 2017, ICML.
[101] Chen-Yu Wei,et al. Online Reinforcement Learning in Stochastic Games , 2017, NIPS.
[102] Eric B. Laber,et al. Interactive Q-Learning for Quantiles , 2017, Journal of the American Statistical Association.
[103] James Zou,et al. The Effects of Memory Replay in Reinforcement Learning , 2017, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[104] Ohad Shamir,et al. Size-Independent Sample Complexity of Neural Networks , 2017, COLT.
[105] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[106] Chengchun Shi,et al. HIGH-DIMENSIONAL A-LEARNING FOR OPTIMAL DYNAMIC TREATMENT REGIMES. , 2018, Annals of statistics.
[107] Michael H. Bowling,et al. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.
[108] Francis Bach,et al. A Note on Lazy Training in Supervised Differentiable Programming , 2018, ArXiv.
[109] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[110] Andrew R. Barron,et al. Approximation and Estimation for High-Dimensional Deep Learning Networks , 2018, ArXiv.
[111] Tamer Basar,et al. Finite-Sample Analyses for Fully Decentralized Multi-Agent Reinforcement Learning , 2018, ArXiv.
[112] Olivier Pietquin,et al. Actor-Critic Fictitious Play in Simultaneous Move Multistage Games , 2018, AISTATS.
[113] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[114] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[115] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[116] Petros Koumoutsakos,et al. Remember and Forget for Experience Replay , 2018, ICML.
[117] Rui Song,et al. Proper Inference for Value Function in High-Dimensional Q-Learning for Dynamic Treatment Regimes , 2018, Journal of the American Statistical Association.
[118] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[119] Joelle Pineau,et al. Benchmarking Batch Deep Reinforcement Learning Algorithms , 2019, ArXiv.
[120] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[121] Jiming Liu,et al. Reinforcement Learning in Healthcare: A Survey , 2019, ACM Comput. Surv..
[122] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[123] Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
[124] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[125] Cho-Jui Hsieh,et al. Convergence of Adversarial Training in Overparametrized Networks , 2019, ArXiv.
[126] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[127] Julien Mairal,et al. On the Inductive Bias of Neural Tangent Kernels , 2019, NeurIPS.
[128] M. Kohler,et al. On deep learning as a remedy for the curse of dimensionality in nonparametric regression , 2019, The Annals of Statistics.
[129] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[130] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[131] Zhiyuan Xu,et al. Learning the Dynamic Treatment Regimes from Medical Registry Data through Deep Q-network , 2019, Scientific Reports.
[132] Greg Yang,et al. Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation , 2019, ArXiv.
[133] Peter L. Bartlett,et al. Nearly-tight VC-dimension and Pseudodimension Bounds for Piecewise Linear Neural Networks , 2017, J. Mach. Learn. Res..
[134] Taiji Suzuki,et al. Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality , 2018, ICLR.
[135] Yuan Cao,et al. A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , 2019, ArXiv.
[136] Anastasios A. Tsiatis,et al. Dynamic Treatment Regimes , 2019 .
[137] Greg Yang,et al. A Fine-Grained Spectral Perspective on Neural Networks , 2019, ArXiv.
[138] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[139] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[140] Dale Schuurmans,et al. Striving for Simplicity in Off-policy Deep Reinforcement Learning , 2019, ArXiv.
[141] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[142] Yuan Cao,et al. Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks , 2019, NeurIPS.
[143] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[144] J. Lee,et al. Neural Temporal-Difference Learning Converges to Global Optima , 2019, NeurIPS.
[145] Gilad Yehudai,et al. On the Power and Limitations of Random Features for Understanding Neural Networks , 2019, NeurIPS.
[146] Andrea Montanari,et al. Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit , 2019, COLT.
[147] Cho-Jui Hsieh,et al. Convergence of Adversarial Training in Overparametrized Neural Networks , 2019, NeurIPS.
[148] Jason D. Lee,et al. Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks , 2019, ICLR.
[149] Tuo Zhao,et al. Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? - A Neural Tangent Kernel Perspective , 2020, NeurIPS.
[150] Quanquan Gu,et al. A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation , 2019, ICML.
[151] Quanquan Gu,et al. Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks , 2019, AAAI.
[152] Rishabh Agarwal,et al. An Optimistic Perspective on Offline Reinforcement Learning , 2019, ICML.
[153] Lei Wu,et al. A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics , 2019, Science China Mathematics.
[154] Zhaoran Wang,et al. Neural Policy Gradient Methods: Global Optimality and Rates of Convergence , 2019, ICLR.
[155] Cong Ma,et al. A Selective Overview of Deep Learning , 2019, Statistical science : a review journal of the Institute of Mathematical Statistics.
[156] Kaiqing Zhang,et al. Finite-Sample Analysis for Decentralized Batch Multiagent Reinforcement Learning With Networked Agents , 2018, IEEE Transactions on Automatic Control.