暂无分享,去创建一个
[1] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[3] Andrew Chi-Chih Yao,et al. Probabilistic computations: Toward a unified measure of complexity , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).
[4] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[5] Matthieu Geist,et al. Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search , 2014, ECML/PKDD.
[6] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[7] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[8] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[9] Sanjoy Dasgupta,et al. An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.
[10] Lin F. Yang,et al. Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model , 2018, 1806.01492.
[11] Noga Alon,et al. Perturbed Identity Matrices Have High Rank: Proof and Applications , 2009, Combinatorics, Probability and Computing.
[12] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[13] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[14] Baruch Awerbuch,et al. Online linear optimization and adaptive routing , 2008, J. Comput. Syst. Sci..
[15] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[16] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[17] Nan Jiang,et al. Provably efficient RL with Rich Observations via Latent State Decoding , 2019, ICML.
[18] Avi Wigderson,et al. Rank bounds for design matrices with applications to combinatorial geometry and locally correctable codes , 2010, STOC '11.
[19] Nan Jiang,et al. Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches , 2018, COLT.
[20] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[21] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[22] Ruosong Wang,et al. Classical Algorithms from Quantum and Arthur-Merlin Communication Protocols , 2019, ITCS.
[23] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[24] Noga Alon,et al. The approximate rank of a matrix and its algorithmic applications: approximate rank , 2013, STOC '13.
[25] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[26] Byron Boots,et al. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.
[27] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[28] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[29] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[30] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[31] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[32] John Langford,et al. PAC Reinforcement Learning with Rich Observations , 2016, NIPS.
[33] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[34] Csaba Szepesvari,et al. Regularization in reinforcement learning , 2011 .
[35] Ruosong Wang,et al. Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle , 2019, NeurIPS.
[36] Sham M. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[37] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .
[38] Noga Alon,et al. The Cover Number of a Matrix and its Algorithmic Applications , 2014, APPROX-RANDOM.
[39] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[40] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[41] Nan Jiang,et al. On Polynomial Time PAC Reinforcement Learning with Rich Observations , 2018, ArXiv.
[42] Tor Lattimore,et al. Learning with Good Feature Representations in Bandits and in RL with a Generative Model , 2020, ICML.
[43] Zheng Wen,et al. Efficient Exploration and Value Function Generalization in Deterministic Systems , 2013, NIPS.
[44] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[45] G. Lorentz. Metric entropy and approximation , 1966 .
[46] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML '08.
[47] Benjamin Van Roy,et al. Comments on the Du-Kakade-Wang-Yang Lower Bounds , 2019, ArXiv.